Here, we discuss the Fisher’s exact contingency table tests in R with interpretations, including, p-values, and odds ratios.
The Fisher’s exact contingency table test in R can be performed with
the fisher.test()
function from the base "stats" package.
The Fisher’s exact contingency table test of independence can be used to test whether the row variable (with \(r\geq2\) rows) and column variable (with \(c\geq2\) columns) in a contingency table are independent as stated in the null hypothesis.
In the Fisher’s exact contingency table test, with the row and column totals being fixed, the distribution of cell values is based on hypergeometric distribution when the null hypothesis is true.
Question | Are the row and column variables independent? |
---|---|
Null Hypothesis, \(H_0\) | The row and column variables are independent, hence, the row (column) cell proportions are equal for all rows (columns). |
Alternate Hypothesis, \(H_1\) | The row and column variables are dependent, hence, at least one row’s (or column’s) cell proportions is different. |
Color \ Status | Off | On | Total |
---|---|---|---|
Green | 8 | 12 | 20 |
Yellow | 4 | 14 | 18 |
Total | 12 | 26 | 38 |
# Create the data for the Fisher's exact contingency table test
data = rbind(c(8, 12), c(4, 14))
# Run the Fisher's exact contingency table test with specifications
fisher.test(data,
alternative = "two.sided",
conf.level = 0.95)
Fisher's Exact Test for Count Data
data: data
p-value = 0.3067
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.464589 13.108777
sample estimates:
odds ratio
2.281547
Or:
# Create the data for the Fisher's exact contingency table test
green = c(off = 8, on = 12)
yellow = c(off = 4, on = 14)
rbind(green, yellow)
# Run the Fisher's exact contingency table test with specifications
fisher.test(rbind(green, yellow),
alternative = "two.sided",
conf.level = 0.95)
Or:
# Create the data for the Fisher's exact contingency table test
x = c(rep("Green", 20), rep("Yellow", 18))
y = c(rep("Off", 8), rep("On", 12),
rep("Off", 4), rep("On", 14))
table(x, y)
data.frame(x, y)
# Run the Fisher's exact contingency table test with specifications
fisher.test(x, y,
alternative = "two.sided",
conf.level = 0.95)
Argument | Usage |
x | Matrix of values |
y | For x as a factor, y will be a factor of the same length |
alternative | Set alternate hypothesis as "greater", "less", or the default "two.sided" |
conf.int | Set to FALSE to remove confidence interval for odds
ratio, applicable to 2x2 matrix only (default = TRUE ) |
conf.level | Level of confidence for the odds ratio confidence interval (default = 0.95), applicable to 2x2 matrix only |
# Create object
fsh_object = fisher.test(rbind(c(8, 12), c(4, 14)))
# Extract a component
fsh_object$p.value
[1] 0.3066857
Test Component | Usage |
fsh_object$p.value | P-value |
fsh_object$estimate | Odds ratio; for 2x2 tables only |
fsh_object$conf.int | Confidence interval odds ratio; for 2x2 tables only |
In the Fisher’s exact tests, for contingency tables with \(r\) rows and \(c\) columns, conditional on the row and column totals being fixed, the probability of the cell values based on hypergeometric distribution takes the form:
\[p_{table} = \frac{(R_{1}! ~\times~ R_{2}! ~\times~\cdots ~\times~R_{r}!)(C_{1}! ~\times~ C_{2}! ~\times~\cdots ~\times~C_{c}!)}{N!~\times~\prod n_{ij}!}\] With \(R_i\) as total for row \(i\), and \(C_j\) as total for column \(j\),
\(n_{ij}'s\) are the values in cell \(ij\) on row \(i\) and column \(j\),
\(N\) is the total observations,
\(a!\), the \(\tt{factorial\; operator}\) equals \(a\times(a-1)\times(a-2)\times\cdots\times 1\),
\(\prod\) is the \(\tt{product \; operator}\).
Level \ Position | In | Out | Total |
---|---|---|---|
Up | a | b | a+b |
Down | c | d | c+d |
Total | a+c | b+d | n |
\[\begin{align} p_{table} &=\frac{ \displaystyle{{a+b}\choose{a}} \displaystyle{{c+d}\choose{c}} }{ \displaystyle{{n}\choose{a+c}} } = \frac{ \displaystyle{{a+b}\choose{b}} \displaystyle{{c+d}\choose{d}} }{ \displaystyle{{n}\choose{b+d}} } \\ & = \frac{ \displaystyle{{a+c}\choose{a}} \displaystyle{{b+d}\choose{b}} }{ \displaystyle{{n}\choose{a+b}} } = \frac{ \displaystyle{{a+c}\choose{c}} \displaystyle{{b+d}\choose{d}} }{ \displaystyle{{n}\choose{c+d}} } \\ & = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{n!~a!~b!~c!~d!},\end{align}\]
the \(\tt{binomial\; coefficient}\) \(\binom{a}{b}\) equals \(\frac{a!}{b!(a-b)!}\),
and \(a!\), the \(\tt{factorial\; operator}\) equals \(a\times(a-1)\times(a-2)\times\cdots\times 1\).
See also chi-squared contingency table tests for chi-squared approximations.
For test of independence between treatment and success among 22 patients.
Treatment \ Success | Yes | No | Total |
---|---|---|---|
A | 6 | 4 | 10 |
B | 3 | 9 | 12 |
Total | 9 | 13 | 22 |
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).
\(H_0:\) the row (treatment) and column (success) variables are independent.
\(H_1:\) the row (treatment) and column (success) variables are dependent.
Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).
The fisher.test()
function has the default
alternative as "two.sided", and for 2x2 tables, the
default level of confidence as 0.95, hence, you do not need to
specify the "alternative", and "conf.level" arguments in this
case.
Or:
Fisher's Exact Test for Count Data
data: rbind(c(6, 4), c(3, 9))
p-value = 0.192
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.5461764 41.0676400
sample estimates:
odds ratio
4.174025
the p-value, \(p\), is 0.192,
the sample odds ratio is 4.174025,
the confidence interval is [0.5461764, 41.0676400].
Note that for fisher.test()
in R, the Fisher’s exact
test’s p-value method may disagree with the confidence interval method
for some edge cases. This is because the p-value is based on the
hypergeometric distribution, while the confidence interval and odds
ratio in R are based on some theoretical adjustments.
P-value: With the p-value (\(p = 0.192\)) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the row (treatment) and column (success) variables are independent. Hence, treatment does not affect success.
Confidence Interval: With the null hypothesis odds ratio (\(\text{ratio} = 1\)) being inside the confidence interval, \([0.5461764, 41.0676400]\), we fail to reject the null hypothesis that the row (treatment) and column (success) variables are independent. Hence, treatment does not affect success.
For test of independence between detergents and cleanliness from 30 washed clothes.
Detergent \ Cleanliness | High | Medium | Low | Dirty | Total |
---|---|---|---|---|---|
A | 5 | 3 | 1 | 0 | 9 |
B | 1 | 0 | 4 | 3 | 8 |
C | 0 | 1 | 2 | 1 | 4 |
D | 1 | 2 | 5 | 1 | 9 |
Total | 7 | 6 | 12 | 5 | 30 |
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\).
\(H_0:\) the row (detergent) and column (cleanliness) variables are independent.
\(H_1:\) the row (detergent) and column (cleanliness) variables are dependent.
Fisher's Exact Test for Count Data
data: rbind(c(5, 3, 1, 0), c(1, 0, 4, 3), c(0, 1, 2, 1), c(1, 2, 5, 1))
p-value = 0.06551
alternative hypothesis: two.sided
P-value: With the p-value (\(p = 0.06551\)) being less than the level of significance 0.1, we reject the null hypothesis that the row (detergent) and column (cleanliness) variables are independent. Hence, detergent impacts cleanliness.
Confidence Interval: This is only applicable to 2x2 tables.
For test of independence between coach and success among 16 athletes.
Coach \ Success | Yes | No | Total |
---|---|---|---|
A | 7 | 2 | 9 |
B | 3 | 4 | 7 |
Total | 10 | 6 | 16 |
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).
\(H_0:\) the row (coach) and column (success) variables are independent.
\(H_1:\) the row (coach) and column (success) variables are dependent.
Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).
Fisher's Exact Test for Count Data
data: rbind(c(7, 2), c(3, 4))
p-value = 0.1818
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.4920756 Inf
sample estimates:
odds ratio
4.193432
P-value: With the p-value (\(p = 0.1818\)) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the row (coach) and column (success) variables are independent.
Confidence Interval: With the null hypothesis odds ratio (\(\text{ratio} = 1\)) being inside the confidence interval, \([0.4920756, \infty)\), we fail to reject the null hypothesis that the row (coach) and column (success) variables are independent.
For the following sample 2x2 two-way contingency table, and the conditional probability for the table values, \(p_{table}\), given that the row and column totals are fixed, we can derive p-values.
Direction \ Response | Right | Left | Total |
---|---|---|---|
Yes | a | b | a+b |
No | c | d | c+d |
Total | a+c | b+d | n |
\[p_{table} = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{n!~a!~b!~c!~d!}\]
For 2x2 tables, given that the right-tailed p-value is the sum of probabilities of tables with odds ratio \([(a/b)/(c/d)]\) greater than or equal to that of Table R1 above, we add up the probabilities of such table.
To increase the odds ratio, we need to increase cell \(a\), keeping the row and column totals fixed. Only Table R2 and Table R3 below satisfy this requirement, hence, we add up the probabilities of Table R1, Table R2 and Table R3.
Coach \ Success | Yes | No | Total |
---|---|---|---|
A | 8 | 1 | 9 |
B | 2 | 5 | 7 |
Total | 10 | 6 | 16 |
Coach \ Success | Yes | No | Total |
---|---|---|---|
A | 9 | 0 | 9 |
B | 1 | 6 | 7 |
Total | 10 | 6 | 16 |
num1 = prod(factorial(c(9, 7, 10, 6)))
denom1 = factorial(16)*prod(factorial(c(7, 2, 3, 4)))
prob1 = num1/denom1
num2 = prod(factorial(c(9, 7, 10, 6)))
denom2 = factorial(16)*prod(factorial(c(8, 1, 2, 5)))
prob2 = num2/denom2
num3 = prod(factorial(c(9, 7, 10, 6)))
denom3 = factorial(16)*prod(factorial(c(9, 0, 1, 6)))
prob3 = num3/denom3
Hence the derived right-tailed p-value is:
[1] 0.1818182
This is equal to the p-value obtained above in the
fisher.test()
function.
For test of independence between color and texture from 15 pebbles.
Color \ Texture | Rough | Smooth | Total |
---|---|---|---|
Red | 1 | 7 | 8 |
Black | 5 | 2 | 7 |
Total | 6 | 9 | 15 |
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\).
\(H_0:\) the row (color) and column (texture) variables are independent.
\(H_1:\) the row (color) and column (texture) variables are dependent.
Because the level of significance is \(\alpha=0.1\), the level of confidence is \(1 - \alpha = 0.9\).
Fisher's Exact Test for Count Data
data: rbind(c(1, 7), c(5, 2))
p-value = 0.03497
alternative hypothesis: true odds ratio is less than 1
90 percent confidence interval:
0.0000000 0.5912598
sample estimates:
odds ratio
0.07356123
P-value: With the p-value (\(p = 0.03497\)) being less than the level of significance 0.1, we reject the null hypothesis that the row (color) and column (texture) variables are independent. Hence, color impacts texture, or texture impacts color.
Confidence Interval: With the null hypothesis odds ratio (\(\text{ratio} = 1\)) being outside the confidence interval, \((0.0000000, 0.5912598]\), we reject the null hypothesis that the row (color) and column (texture) variables are independent. Hence, color impacts texture, or texture impacts color.
For the following sample 2x2 two-way contingency table, and the conditional probability for the table values, \(p_{table}\), given that the row and column totals are fixed, we can derive p-values.
Level \ Position | In | Out | Total |
---|---|---|---|
Up | a | b | a+b |
Down | c | d | c+d |
Total | a+c | b+d | n |
\[p_{table} = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{n!~a!~b!~c!~d!}\] For 2x2 tables, given that the left-tailed p-value is the sum of probabilities of tables with odds ratio \([(a/b)/(c/d)]\) less than or equal to that of Table L1 above, we add up the probabilities of such table.
To reduce the odds ratio, we need to reduce cell \(a\), keeping the row and column totals fixed. Only Table L2 below satisfies this requirement, hence, we add up the probabilities of Table L1 and Table L2.
Color \ Texture | Rough | Smooth | Total |
---|---|---|---|
Red | 0 | 8 | 8 |
Black | 6 | 1 | 7 |
Total | 6 | 9 | 15 |
num1 = prod(factorial(c(8, 7, 6, 9)))
denom1 = factorial(15)*prod(factorial(c(1, 7, 5, 2)))
prob1 = num1/denom1
num2 = prod(factorial(c(8, 7, 6, 9)))
denom2 = factorial(15)*prod(factorial(c(0, 8, 6, 1)))
prob2 = num2/denom2
Hence the derived left-tailed p-value is:
[1] 0.03496503
This is equal to the p-value obtained above in the
fisher.test()
function.
For the following table:
Present \ Used | Yes | No | Total |
---|---|---|---|
Yes | a=7 | b=1 | a+b=8 |
No | c=2 | d=4 | c+d=6 |
Total | a+c=9 | b+d=5 | 14 |
Fisher's Exact Test for Count Data
data: rbind(c(7, 1), c(2, 4))
p-value = 0.09091
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.6418261 779.1595463
sample estimates:
odds ratio
10.98111
The p-value is 0.09091.
The odds ratio is greater than 1 \([(a/b)/(c/d)>1]\). Hence, calculate the right-tailed p-value as in the example in the section above using the probability for Table T1 and the tables with greater odds ratio, keeping the row and column totals fixed.
num1 = prod(factorial(c(8, 6, 9, 5)))
denom1 = factorial(14)*prod(factorial(c(7, 1, 2, 4)))
probr1 = num1/denom1; probr1
[1] 0.05994006
num2 = prod(factorial(c(8, 6, 9, 5)))
denom2 = factorial(14)*prod(factorial(c(8, 0, 1, 5)))
probr2 = num2/denom2; probr2
[1] 0.002997003
For the left-tail, reduce the odds ratio (cell \(a\)) to the smallest possible as in Table T2 below. Then add the probabilities from tables with probability less than or equal to the probability of Table T1, as you increase the odds ratio again one step at a time.
Present \ Used | Yes | No | Total |
---|---|---|---|
Yes | a=3 | b=5 | a+b=8 |
No | c=6 | d=0 | c+d=6 |
Total | a+c=9 | b+d=5 | 14 |
num1 = prod(factorial(c(8, 6, 9, 5)))
denom1 = factorial(14)*prod(factorial(c(3, 5, 6, 0)))
probl1 = num1/denom1; probl1
[1] 0.02797203
num2 = prod(factorial(c(8, 6, 9, 5)))
denom2 = factorial(14)*prod(factorial(c(4, 4, 5, 1)))
probl2 = num2/denom2; probl2
[1] 0.2097902
The probability for the second table (0.2097902) is greater than the probability for Table T1 (0.0599401), hence, we stop and exclude it.
The derived two-tailed p-value is then the sum of the right-tail and the left -tail probabilities above:
[1] 0.09090909
This is equal to the p-value obtained above in the
fisher.test()
function.
A similar approach can be followed for cases where the odds ratio in the start table is less than 1. Compute left-tailed p-value; find the table with highest odds ratio possible; then compute the probabilities of tables as you reduce the odds ratio. Finally, add the left-tail probabilities and the right-tail probabilities of tables with probability less than or equal to that of the start table.
The feedback form is a Google form but it does not collect any personal information.
Please click on the link below to go to the Google form.
Thank You!
Go to Feedback Form
Copyright © 2020 - 2024. All Rights Reserved by Stats Codes