Here, we discuss the Fisher’s exact contingency table tests in R with interpretations, including, p-values, and odds ratios.

The Fisher’s exact contingency table test in R can be performed with the fisher.test() function from the base "stats" package.

The Fisher’s exact contingency table test of independence can be used to test whether the row variable (with \(r\geq2\) rows) and column variable (with \(c\geq2\) columns) in a contingency table are independent as stated in the null hypothesis.

In the Fisher’s exact contingency table test, with the row and column totals being fixed, the distribution of cell values is based on hypergeometric distribution when the null hypothesis is true.

Fisher’s Exact Contingency Table Test of Independence & Hypotheses
Question Are the row and column variables independent?
Null Hypothesis, \(H_0\) The row and column variables are independent, hence, the row (column) cell proportions are equal for all rows (columns).
Alternate Hypothesis, \(H_1\) The row and column variables are dependent, hence, at least one row’s (or column’s) cell proportions is different.

Sample Steps to Run a Fisher’s Exact Contingency Table Test:

2x2 Two-way Contingency Table
Color \ Status Off On Total
Green 8 12 20
Yellow 4 14 18
Total 12 26 38
# Create the data for the Fisher's exact contingency table test
data = rbind(c(8, 12), c(4, 14))

# Run the Fisher's exact contingency table test with specifications
fisher.test(data,
            alternative = "two.sided",
            conf.level = 0.95)

    Fisher's Exact Test for Count Data

data:  data
p-value = 0.3067
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.464589 13.108777
sample estimates:
odds ratio 
  2.281547 

Or:

# Create the data for the Fisher's exact contingency table test
green = c(off = 8, on = 12)
yellow = c(off = 4, on = 14)
rbind(green, yellow)

# Run the Fisher's exact contingency table test with specifications
fisher.test(rbind(green, yellow),
            alternative = "two.sided",
            conf.level = 0.95)

Or:

# Create the data for the Fisher's exact contingency table test
x = c(rep("Green", 20), rep("Yellow", 18))
y = c(rep("Off", 8), rep("On", 12),
      rep("Off", 4), rep("On", 14))
table(x, y)
data.frame(x, y)

# Run the Fisher's exact contingency table test with specifications
fisher.test(x, y,
            alternative = "two.sided",
            conf.level = 0.95)
Table of Some Fisher’s Exact Contingency Table Tests Arguments in R
Argument Usage
x Matrix of values
y For x as a factor, y will be a factor of the same length
alternative Set alternate hypothesis as "greater", "less", or the default "two.sided"
conf.int Set to FALSE to remove confidence interval for odds ratio, applicable to 2x2 matrix only (default = TRUE)
conf.level Level of confidence for the odds ratio confidence interval (default = 0.95), applicable to 2x2 matrix only

Creating a Fisher’s Exact Contingency Table Test Object:

# Create object
fsh_object = fisher.test(rbind(c(8, 12), c(4, 14)))

# Extract a component
fsh_object$p.value
[1] 0.3066857
Table of Some Fisher’s Exact Contingency Table Test Object Outputs in R
Test Component Usage
fsh_object$p.value P-value
fsh_object$estimate Odds ratio; for 2x2 tables only
fsh_object$conf.int Confidence interval odds ratio; for 2x2 tables only

1 Probability for Fisher’s Exact Contingency Table Tests in R

In the Fisher’s exact tests, for contingency tables with \(r\) rows and \(c\) columns, conditional on the row and column totals being fixed, the probability of the cell values based on hypergeometric distribution takes the form:

\[p_{table} = \frac{(R_{1}! ~\times~ R_{2}! ~\times~\cdots ~\times~R_{r}!)(C_{1}! ~\times~ C_{2}! ~\times~\cdots ~\times~C_{c}!)}{N!~\times~\prod n_{ij}!}\] With \(R_i\) as total for row \(i\), and \(C_j\) as total for column \(j\),

\(n_{ij}'s\) are the values in cell \(ij\) on row \(i\) and column \(j\),

\(N\) is the total observations,

\(a!\), the \(\tt{factorial\; operator}\) equals \(a\times(a-1)\times(a-2)\times\cdots\times 1\),

\(\prod\) is the \(\tt{product \; operator}\).

For 2x2 tables example:

2x2 Two-way Contingency Table
Level \ Position In Out Total
Up a b a+b
Down c d c+d
Total a+c b+d n

\[\begin{align} p_{table} &=\frac{ \displaystyle{{a+b}\choose{a}} \displaystyle{{c+d}\choose{c}} }{ \displaystyle{{n}\choose{a+c}} } = \frac{ \displaystyle{{a+b}\choose{b}} \displaystyle{{c+d}\choose{d}} }{ \displaystyle{{n}\choose{b+d}} } \\ & = \frac{ \displaystyle{{a+c}\choose{a}} \displaystyle{{b+d}\choose{b}} }{ \displaystyle{{n}\choose{a+b}} } = \frac{ \displaystyle{{a+c}\choose{c}} \displaystyle{{b+d}\choose{d}} }{ \displaystyle{{n}\choose{c+d}} } \\ & = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{n!~a!~b!~c!~d!},\end{align}\]

the \(\tt{binomial\; coefficient}\) \(\binom{a}{b}\) equals \(\frac{a!}{b!(a-b)!}\),

and \(a!\), the \(\tt{factorial\; operator}\) equals \(a\times(a-1)\times(a-2)\times\cdots\times 1\).

See also chi-squared contingency table tests for chi-squared approximations.

2 Simple Fisher’s Exact Test of Independence in R

For test of independence between treatment and success among 22 patients.

2x2 Two-way Contingency Table
Treatment \ Success Yes No Total
A 6 4 10
B 3 9 12
Total 9 13 22


For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).

\(H_0:\) the row (treatment) and column (success) variables are independent.

\(H_1:\) the row (treatment) and column (success) variables are dependent.

Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).

The fisher.test() function has the default alternative as "two.sided", and for 2x2 tables, the default level of confidence as 0.95, hence, you do not need to specify the "alternative", and "conf.level" arguments in this case.

fisher.test(rbind(c(6, 4), c(3, 9)),
            alternative = "two.sided",
            conf.level = 0.95)

Or:

fisher.test(rbind(c(6, 4), c(3, 9)))

    Fisher's Exact Test for Count Data

data:  rbind(c(6, 4), c(3, 9))
p-value = 0.192
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.5461764 41.0676400
sample estimates:
odds ratio 
  4.174025 

the p-value, \(p\), is 0.192,

the sample odds ratio is 4.174025,

the confidence interval is [0.5461764, 41.0676400].

Interpretation:

Note that for fisher.test() in R, the Fisher’s exact test’s p-value method may disagree with the confidence interval method for some edge cases. This is because the p-value is based on the hypergeometric distribution, while the confidence interval and odds ratio in R are based on some theoretical adjustments.

  • P-value: With the p-value (\(p = 0.192\)) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the row (treatment) and column (success) variables are independent. Hence, treatment does not affect success.

  • Confidence Interval: With the null hypothesis odds ratio (\(\text{ratio} = 1\)) being inside the confidence interval, \([0.5461764, 41.0676400]\), we fail to reject the null hypothesis that the row (treatment) and column (success) variables are independent. Hence, treatment does not affect success.

3 Fisher’s Exact Test of Independence for Large Tables (4x4 Table) in R

For test of independence between detergents and cleanliness from 30 washed clothes.

4x4 Two-way Contingency Table
Detergent \ Cleanliness High Medium Low Dirty Total
A 5 3 1 0 9
B 1 0 4 3 8
C 0 1 2 1 4
D 1 2 5 1 9
Total 7 6 12 5 30


For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\).

\(H_0:\) the row (detergent) and column (cleanliness) variables are independent.

\(H_1:\) the row (detergent) and column (cleanliness) variables are dependent.

fisher.test(rbind(c(5, 3, 1, 0),
                 c(1, 0, 4, 3),
                 c(0, 1, 2, 1),
                 c(1, 2, 5, 1)))

    Fisher's Exact Test for Count Data

data:  rbind(c(5, 3, 1, 0), c(1, 0, 4, 3), c(0, 1, 2, 1), c(1, 2, 5, 1))
p-value = 0.06551
alternative hypothesis: two.sided

Interpretation:

  • P-value: With the p-value (\(p = 0.06551\)) being less than the level of significance 0.1, we reject the null hypothesis that the row (detergent) and column (cleanliness) variables are independent. Hence, detergent impacts cleanliness.

  • Confidence Interval: This is only applicable to 2x2 tables.

4 One-tailed Fisher’s Exact Test of Independence in R

Right Tailed Test

For test of independence between coach and success among 16 athletes.

Table R1: 2x2 Two-way Contingency Table
Coach \ Success Yes No Total
A 7 2 9
B 3 4 7
Total 10 6 16


For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).

\(H_0:\) the row (coach) and column (success) variables are independent.

\(H_1:\) the row (coach) and column (success) variables are dependent.

Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).

fisher.test(rbind(c(7, 2), c(3, 4)),
            alternative = "greater",
            conf.level = 0.95)

    Fisher's Exact Test for Count Data

data:  rbind(c(7, 2), c(3, 4))
p-value = 0.1818
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
 0.4920756       Inf
sample estimates:
odds ratio 
  4.193432 

Interpretation:

  • P-value: With the p-value (\(p = 0.1818\)) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the row (coach) and column (success) variables are independent.

  • Confidence Interval: With the null hypothesis odds ratio (\(\text{ratio} = 1\)) being inside the confidence interval, \([0.4920756, \infty)\), we fail to reject the null hypothesis that the row (coach) and column (success) variables are independent.

For the following sample 2x2 two-way contingency table, and the conditional probability for the table values, \(p_{table}\), given that the row and column totals are fixed, we can derive p-values.

2x2 Two-way Contingency Table
Direction \ Response Right Left Total
Yes a b a+b
No c d c+d
Total a+c b+d n

\[p_{table} = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{n!~a!~b!~c!~d!}\]

For 2x2 tables, given that the right-tailed p-value is the sum of probabilities of tables with odds ratio \([(a/b)/(c/d)]\) greater than or equal to that of Table R1 above, we add up the probabilities of such table.

To increase the odds ratio, we need to increase cell \(a\), keeping the row and column totals fixed. Only Table R2 and Table R3 below satisfy this requirement, hence, we add up the probabilities of Table R1, Table R2 and Table R3.

Table R2: 2x2 Two-way Contingency Table
Coach \ Success Yes No Total
A 8 1 9
B 2 5 7
Total 10 6 16


Table R3: 2x2 Two-way Contingency Table
Coach \ Success Yes No Total
A 9 0 9
B 1 6 7
Total 10 6 16


num1 = prod(factorial(c(9, 7, 10, 6)))
denom1 = factorial(16)*prod(factorial(c(7, 2, 3, 4)))
prob1 = num1/denom1
num2 = prod(factorial(c(9, 7, 10, 6)))
denom2 = factorial(16)*prod(factorial(c(8, 1, 2, 5)))
prob2 = num2/denom2
num3 = prod(factorial(c(9, 7, 10, 6)))
denom3 = factorial(16)*prod(factorial(c(9, 0, 1, 6)))
prob3 = num3/denom3

Hence the derived right-tailed p-value is:

prob1 + prob2 + prob3
[1] 0.1818182

This is equal to the p-value obtained above in the fisher.test() function.

Left Tailed Test

For test of independence between color and texture from 15 pebbles.

Table L1: 2x2 Two-way Contingency Table
Color \ Texture Rough Smooth Total
Red 1 7 8
Black 5 2 7
Total 6 9 15


For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\).

\(H_0:\) the row (color) and column (texture) variables are independent.

\(H_1:\) the row (color) and column (texture) variables are dependent.

Because the level of significance is \(\alpha=0.1\), the level of confidence is \(1 - \alpha = 0.9\).

fisher.test(rbind(c(1, 7), c(5, 2)),
            alternative = "less",
            conf.level = 0.9)

    Fisher's Exact Test for Count Data

data:  rbind(c(1, 7), c(5, 2))
p-value = 0.03497
alternative hypothesis: true odds ratio is less than 1
90 percent confidence interval:
 0.0000000 0.5912598
sample estimates:
odds ratio 
0.07356123 

Interpretation:

  • P-value: With the p-value (\(p = 0.03497\)) being less than the level of significance 0.1, we reject the null hypothesis that the row (color) and column (texture) variables are independent. Hence, color impacts texture, or texture impacts color.

  • Confidence Interval: With the null hypothesis odds ratio (\(\text{ratio} = 1\)) being outside the confidence interval, \((0.0000000, 0.5912598]\), we reject the null hypothesis that the row (color) and column (texture) variables are independent. Hence, color impacts texture, or texture impacts color.

For the following sample 2x2 two-way contingency table, and the conditional probability for the table values, \(p_{table}\), given that the row and column totals are fixed, we can derive p-values.

2x2 Two-way Contingency Table
Level \ Position In Out Total
Up a b a+b
Down c d c+d
Total a+c b+d n

\[p_{table} = \frac{(a+b)!~(c+d)!~(a+c)!~(b+d)!}{n!~a!~b!~c!~d!}\] For 2x2 tables, given that the left-tailed p-value is the sum of probabilities of tables with odds ratio \([(a/b)/(c/d)]\) less than or equal to that of Table L1 above, we add up the probabilities of such table.

To reduce the odds ratio, we need to reduce cell \(a\), keeping the row and column totals fixed. Only Table L2 below satisfies this requirement, hence, we add up the probabilities of Table L1 and Table L2.

Table L2: 2x2 Two-way Contingency Table
Color \ Texture Rough Smooth Total
Red 0 8 8
Black 6 1 7
Total 6 9 15
num1 = prod(factorial(c(8, 7, 6, 9)))
denom1 = factorial(15)*prod(factorial(c(1, 7, 5, 2)))
prob1 = num1/denom1
num2 = prod(factorial(c(8, 7, 6, 9)))
denom2 = factorial(15)*prod(factorial(c(0, 8, 6, 1)))
prob2 = num2/denom2

Hence the derived left-tailed p-value is:

prob1 + prob2
[1] 0.03496503

This is equal to the p-value obtained above in the fisher.test() function.

5 Calculate P-value for Fisher’s Exact 2x2 Table Two-tailed Test in R

For the following table:

Table T1: 2x2 Two-way Contingency Table
Present \ Used Yes No Total
Yes a=7 b=1 a+b=8
No c=2 d=4 c+d=6
Total a+c=9 b+d=5 14
fisher.test(rbind(c(7, 1), c(2, 4)),
            alternative = "two.sided")

    Fisher's Exact Test for Count Data

data:  rbind(c(7, 1), c(2, 4))
p-value = 0.09091
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0.6418261 779.1595463
sample estimates:
odds ratio 
  10.98111 

The p-value is 0.09091.

Right-tail probabilities:

The odds ratio is greater than 1 \([(a/b)/(c/d)>1]\). Hence, calculate the right-tailed p-value as in the example in the section above using the probability for Table T1 and the tables with greater odds ratio, keeping the row and column totals fixed.

num1 = prod(factorial(c(8, 6, 9, 5)))
denom1 = factorial(14)*prod(factorial(c(7, 1, 2, 4)))
probr1 = num1/denom1; probr1
[1] 0.05994006
num2 = prod(factorial(c(8, 6, 9, 5)))
denom2 = factorial(14)*prod(factorial(c(8, 0, 1, 5)))
probr2 = num2/denom2; probr2
[1] 0.002997003

Left-tail probabilities:

For the left-tail, reduce the odds ratio (cell \(a\)) to the smallest possible as in Table T2 below. Then add the probabilities from tables with probability less than or equal to the probability of Table T1, as you increase the odds ratio again one step at a time.

Table T2: 2x2 Two-way Contingency Table
Present \ Used Yes No Total
Yes a=3 b=5 a+b=8
No c=6 d=0 c+d=6
Total a+c=9 b+d=5 14


num1 = prod(factorial(c(8, 6, 9, 5)))
denom1 = factorial(14)*prod(factorial(c(3, 5, 6, 0)))
probl1 = num1/denom1; probl1
[1] 0.02797203
num2 = prod(factorial(c(8, 6, 9, 5)))
denom2 = factorial(14)*prod(factorial(c(4, 4, 5, 1)))
probl2 = num2/denom2; probl2
[1] 0.2097902

The probability for the second table (0.2097902) is greater than the probability for Table T1 (0.0599401), hence, we stop and exclude it.

Two-tailed p-value:

The derived two-tailed p-value is then the sum of the right-tail and the left -tail probabilities above:

probr1 + probr2 + probl1
[1] 0.09090909

This is equal to the p-value obtained above in the fisher.test() function.

A similar approach can be followed for cases where the odds ratio in the start table is less than 1. Compute left-tailed p-value; find the table with highest odds ratio possible; then compute the probabilities of tables as you reduce the odds ratio. Finally, add the left-tail probabilities and the right-tail probabilities of tables with probability less than or equal to that of the start table.

Copyright © 2020 - 2024. All Rights Reserved by Stats Codes