Here, we discuss the Wilcoxon rank-sum test in R with interpretations, including, test statistics, p-values, and confidence intervals.
The Wilcoxon rank-sum (or Mann-Whitney U) test in R can be performed
with the wilcox.test()
function from the base "stats" package.
The Wilcoxon rank-sum test, with the assumption that the distributions have similar shapes or are symmetric, can be used to test whether the difference between the medians of the two populations where two independent samples come from is equal to a certain value (which is stated in the null hypothesis) or not. It is a non-parametric alternative to the two independent samples t-test with equal variance assumption.
In the Wilcoxon rank-sum test, the test statistic is based on the sum of ranks. It is the sum of the ranks of the first sample’s values minus the null hypothesis difference between medians, where the values considered in the rankings include, the first sample’s values minus the null hypothesis difference between medians, and the second sample’s values.
Question | Are the medians equal, or difference equal to \(m_0\)? | Is median x greater than median y, or difference greater than \(m_0\)? | Is median x less than median y, or difference less than \(m_0\)? |
Form of Test | Two-tailed | Right-tailed test | Left-tailed test |
Null Hypothesis, \(H_0\) | \(m_x = m_y\); \(\quad\) \(m_x - m_y = m_0\) | \(m_x = m_y\); \(\quad\) \(m_x - m_y = m_0\) | \(m_x = m_y\); \(\quad\) \(m_x - m_y = m_0\) |
Alternate Hypothesis, \(H_1\) | \(m_x \neq m_y\); \(\quad\) \(m_x - m_y \neq m_0\) | \(m_x > m_y\); \(\quad\) \(m_x - m_y > m_0\) | \(m_x < m_y\); \(\quad\) \(m_x - m_y < m_0\) |
# Create the data samples for the Wilcoxon rank-sum test
data_x = c(4.8, 4.2, 4.3, 3.0, 3.9)
data_y = c(4.6, 3.6, 5.0, 5.6,
3.5, 5.1, 4.7, 4.4)
# Run the Wilcoxon rank-sum test with specifications
wilcox.test(data_x, data_y,
mu = 0, alternative = "two.sided",
conf.int = TRUE, conf.level = 0.95)
Wilcoxon rank sum exact test
data: data_x and data_y
W = 11, p-value = 0.2222
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-1.4 0.4
sample estimates:
difference in location
-0.5
Argument | Usage |
x, y | x is the first sample data values, y is the second sample data values |
mu | Population difference between the medians in null hypothesis |
alternative | Set alternate hypothesis as "greater", "less", or the default "two.sided" |
exact | For n_x<50 and n_y<50, and no rank ties: Set to FALSE to compute p-value based on normal distribution,
(default = TRUE ) |
correct | For cases with non-exact p-values: Set to FALSE to
remove continuity correction, (default = TRUE ) |
conf.int | Set to TRUE to include the confidence interval,
(default = FALSE ) |
conf.level | Level of confidence for the test and confidence interval, (default = 0.95) |
# Create data
data_x = rnorm(35); data_y = rnorm(35)
# Create object
wrst_object = wilcox.test(data_x, data_y,
mu = 0, alternative = "two.sided",
conf.int = TRUE, conf.level = 0.95)
# Extract a component
wrst_object$statistic
W
597
Test Component | Usage |
wrst_object$statistic | Test-statistic value |
wrst_object$p.value | P-value |
wrst_object$estimate | Point estimate of population difference between medians when
conf.int = TRUE |
wrst_object$conf.int | Confidence interval when conf.int = TRUE |
With \(\text{rank}(1, 3, 3, 5, 7, 7, 7) = (1, 2.5, 2.5, 4, 6, 6, 6)\).
Let \(x_i's\) (\(1\leq i \leq n_x\)), and \(y_j's\) (\(1\leq j \leq n_y\)) be the sample values,
\(n_x\) is the number of \(x\) observations, and \(n_y\) is the number of \(y\) observations.
\(R_i\) is the rank of \(x_i - m_0\), among all \(x_i - m_0\) and all \(y_j\) values, hence, \(n_x + n_y\) values,
\(m_0\) is the population difference between the medians to be tested and set in the null hypothesis, and
the total number of observations is \(n\), with \(n = n_x + n_y\).
The Wilcoxon rank-sum test has test statistics, \(W\), of the form:
\[W = \sum_{i=1}^{n_x} R_i - \frac{n_x(n_x+1)}{2}.\]
For dependent samples, see the Wilcoxon signed rank test for paired samples and the sign test for paired samples.
For extension to three or more groups, see the Kruskal-Wallis test.
For large samples (\(n_x\geq50\) or \(n_y\geq50\)), or cases with rank ties among the \(x_i - m_0\) and \(y_j\) values:
With \(T\) as the number of sets of unique ranks, and \(t_k\) as the number of tied values for set \(k\) that are tied at a particular value, inference on \(W\) and the test outcome is based on normal distribution approximation by standardizing \(W\).
With \(\frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)}=0\) if there are no ties (all \(t_k = 1\)),
\[z = \frac{W - \frac{n_x n_y}{2}}{\sqrt{\frac{n_x n_y}{12}\left((n + 1) - \frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)} \right)}}.\]
Applying continuity correction for the normal distribution approximation (the default in R),
\[z = \frac{(W+c) - \frac{n_x n_y}{2}}{\sqrt{\frac{n_x n_y}{12}\left((n + 1) - \frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)} \right)}}.\]
For two-sided test, \(c=0.5\) if \(W<\frac{n_x n_y}{2}\), \(c=-0.5\) if \(W>\frac{n_x n_y}{2}\), and \(c=0\) if \(W=\frac{n_x n_y}{2}\). For one-sided test, \(c=-0.5\) if the alternative is "greater", and \(c=0.5\) if it is "less".
For small samples (\(n_x < 50\) and \(n_y < 50\)) with no rank ties:
The p-value is based on the exact distribution of the Wilcoxon rank sum statistic \(W\), with \(\text{sizes}\), \(n_x\) and \(n_y\).
Enter the data by hand.
data_x = c(9.47, 9.00, 11.20, 10.28, 12.60, 7.75,
9.73, 9.58, 7.85, 12.00, 9.79, 7.23)
data_y = c(8.82, 7.79, 12.08, 15.09, 8.65, 9.82,
10.56, 15.42, 5.34, 8.97, 7.59, 9.54)
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).
\(H_0:\) the difference between the population medians is equal to 0 (\(m_x - m_y = 0\)).
\(H_1:\) the difference between the population medians is not equal to 0 (\(m_x - m_y \neq 0\), hence the default two-sided).
Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).
The wilcox.test()
function has the default
alternative as "two.sided", the default difference
between medians as 0, and the default level of
confidence as 0.95, hence, you do not need to specify the
"alternative", "mu", and "conf.level" arguments in this
case.
Or:
Wilcoxon rank sum exact test
data: data_x and data_y
W = 75, p-value = 0.8874
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
-2.07 1.94
sample estimates:
difference in location
0.11
The estimate of the difference between the medians, \(\tilde d\), is 0.11,
test statistic, \(W\), is 75,
the p-value, \(p\), is 0.8874,
the 95% confidence interval is [-2.07, 1.94].
Note that for wilcox.test()
in R, the two methods may
disagree for some edge cases, as p-value is based on exact distribution
or normal distribution, and confidence interval is sometimes based on
approximations.
P-value: With the p-value (\(p = 0.8874\)) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the difference between the population medians is equal to 0.
Confidence Interval: With the null hypothesis difference between the medians (\(m_x - m_y = 0\)) being inside the confidence interval, \([-2.07, 1.94]\), we fail to reject the null hypothesis that the difference between the population medians is equal to 0.
Using the USJudgeRatings data from the "datasets" package with 10 sample rows from 43 rows below:
Judge_data = USJudgeRatings[,1:6]
colnames(Judge_data) = c("CONT", "INTG", "DMNR", "DILG", "CFMG", "DECI")
rownames(Judge_data) = NULL
Judge_data
CONT INTG DMNR DILG CFMG DECI
1 5.7 7.9 7.7 7.3 7.1 7.4
13 6.7 8.6 8.2 6.8 6.9 6.6
16 7.3 8.0 7.4 7.7 7.3 7.3
18 7.7 7.7 6.7 7.5 7.4 7.5
21 7.1 8.2 7.7 7.1 6.6 6.6
24 6.2 8.3 8.1 7.7 7.4 7.3
25 7.5 8.7 8.5 8.6 8.5 8.4
31 6.6 7.4 6.9 8.4 8.0 7.9
36 8.5 8.3 8.1 8.3 8.4 8.2
43 8.6 7.4 7.0 7.5 7.5 7.7
For “INTG” as the x group versus “CONT” as the y group.
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\), without continuity correction.
\(H_0:\) population median of \(x\) minus population median of \(y\) is equal to 0.3 (\(m_x - m_y = 0.3\)).
\(H_1:\) population median of \(x\) minus population median of \(y\) is not equal to 0.3 (\(m_x - m_y \neq 0.3\), hence the default two-sided).
Because the level of significance is \(\alpha=0.1\), the level of confidence is \(1 - \alpha = 0.9\).
wilcox.test(Judge_data$INTG, Judge_data$CONT,
alternative = "two.sided", mu = 0.3,
correct = FALSE,
conf.int = TRUE, conf.level = 0.9)
Warning in wilcox.test.default(Judge_data$INTG, Judge_data$CONT, alternative =
"two.sided", : cannot compute exact p-value with ties
Warning in wilcox.test.default(Judge_data$INTG, Judge_data$CONT, alternative =
"two.sided", : cannot compute exact confidence intervals with ties
Wilcoxon rank sum test
data: Judge_data$INTG and Judge_data$CONT
W = 1200, p-value = 0.01725
alternative hypothesis: true location shift is not equal to 0.3
90 percent confidence interval:
0.4000099 1.0000483
sample estimates:
difference in location
0.7000586
The warnings are because there are ties in the data. Hence, p-value is based on normal approximation not exact distribution.
P-value: With the p-value (\(p = 0.01725\)) being less than the level of significance 0.1, we reject the null hypothesis that population median of \(x\) minus population median of \(y\) is equal to 0.3.
Confidence Interval: With the null hypothesis difference between the medians (\(m_x - m_y = 0.3\)) being outside the confidence interval, \([0.4000052, 1.0000484]\), we reject the null hypothesis that population median of \(x\) minus population median of \(y\) is equal to 0.3.
Using the USJudgeRatings data from the "datasets" package with 10 sample rows from 43 rows below:
Judge_data = USJudgeRatings[,7:12]
colnames(Judge_data) = c("PREP", "FAMI", "ORAL", "WRIT", "PHYS", "RTEN")
rownames(Judge_data) = NULL
Judge_data
PREP FAMI ORAL WRIT PHYS RTEN
1 7.1 7.1 7.1 7.0 8.3 7.8
8 4.8 5.1 4.7 4.9 6.8 5.0
16 7.3 7.2 7.1 7.2 8.0 7.6
24 7.3 7.3 7.2 7.3 7.8 7.6
27 7.8 7.8 7.8 7.7 8.3 8.2
28 8.4 8.3 8.3 8.3 8.8 8.7
30 9.1 9.1 8.9 9.0 8.9 9.2
33 8.4 8.5 8.1 8.3 8.7 8.3
42 7.8 8.2 8.0 8.1 8.3 8.1
43 7.4 7.2 6.9 7.0 7.8 7.1
For “PHYS” as the x group versus “ORAL” as the y group.
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\).
\(H_0:\) population median of \(x\) minus population median of \(y\) is equal to 0.4 (\(m_x - m_y = 0.4\)).
\(H_1:\) population median of \(x\) minus population median of \(y\) is greater than 0.4 (\(m_x - m_y > 0.4\), hence one-sided).
Because the level of significance is \(\alpha=0.1\), the level of confidence is \(1 - \alpha = 0.9\).
wilcox.test(Judge_data$PHYS, Judge_data$ORAL,
alternative = "greater", mu = 0.4,
conf.int = TRUE, conf.level = 0.9)
Warning in wilcox.test.default(Judge_data$PHYS, Judge_data$ORAL, alternative =
"greater", : cannot compute exact p-value with ties
Warning in wilcox.test.default(Judge_data$PHYS, Judge_data$ORAL, alternative =
"greater", : cannot compute exact confidence intervals with ties
Wilcoxon rank sum test with continuity correction
data: Judge_data$PHYS and Judge_data$ORAL
W = 1079, p-value = 0.09149
alternative hypothesis: true location shift is greater than 0.4
90 percent confidence interval:
0.4000048 Inf
sample estimates:
difference in location
0.6999724
P-value: With the p-value (\(p = 0.09149\)) being less than the level of significance 0.1, we reject the null hypothesis that the population median of \(x\) minus population median of \(y\) is equal to 0.4.
Confidence Interval: With the null hypothesis difference between the medians (\(m_x - m_y = 0.4\)) being outside the confidence interval, \([0.4000048, \infty)\), we reject the null hypothesis that population median of \(x\) minus population median of \(y\) is equal to 0.4.
For “PREP” as the x group versus “FAMI” as the y group.
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).
\(H_0:\) population median of \(x\) and the population median of \(y\) are equal (\(m_x = m_y\)).
\(H_1:\) population median of \(x\) is less than population median of \(y\) (\(m_x < m_y\), hence one-sided).
Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).
wilcox.test(Judge_data$PREP, Judge_data$FAMI,
alternative = "less", mu = 0,
conf.int = TRUE, conf.level = 0.95)
Warning in wilcox.test.default(Judge_data$PREP, Judge_data$FAMI, alternative =
"less", : cannot compute exact p-value with ties
Warning in wilcox.test.default(Judge_data$PREP, Judge_data$FAMI, alternative =
"less", : cannot compute exact confidence intervals with ties
Wilcoxon rank sum test with continuity correction
data: Judge_data$PREP and Judge_data$FAMI
W = 911, p-value = 0.4553
alternative hypothesis: true location shift is less than 0
95 percent confidence interval:
-Inf 0.3000546
sample estimates:
difference in location
-3.72148e-05
P-value: With the p-value (\(p = 0.4553\)) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the population median of \(x\) and the population median of \(y\) are equal.
Confidence Interval: With the null hypothesis difference between the medians (\(m_x - m_y = 0\)) being inside the confidence interval, \((-\infty, 0.3000546]\), we fail to reject the null hypothesis that the population median of \(x\) and the population median of \(y\) are equal.
Here for a Wilcoxon rank-sum test, we show how to get the test
statistics (and z-value), and p-values from the
wilcox.test()
function in R, or by written code.
data_x = Judge_data$RTEN; data_y = Judge_data$WRIT
wrst_object = wilcox.test(data_x, data_y,
alternative = "two.sided", mu = 0.1)
Warning in wilcox.test.default(data_x, data_y, alternative = "two.sided", :
cannot compute exact p-value with ties
Wilcoxon rank sum test with continuity correction
data: data_x and data_y
W = 1032, p-value = 0.3551
alternative hypothesis: true location shift is not equal to 0.1
\[W = \sum_{i=1}^{n_x} R_i - \frac{n_x(n_x+1)}{2}.\]
With continuity correction:
\[z = \frac{(W+c) - \frac{n_x n_y}{2}}{\sqrt{\frac{n_x n_y}{12}\left((n + 1) - \frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)} \right)}}.\]
For two-sided test, \(c=0.5\) if \(W<\frac{n_x n_y}{2}\), \(c=-0.5\) if \(W>\frac{n_x n_y}{2}\), and \(c=0\) if \(W=\frac{n_x n_y}{2}\). For one-sided test, \(c=-0.5\) if the alternative is "greater", and \(c=0.5\) if it is "less".
W
1032
[1] 1032
Same as:
mu = 0.1; x = (data_x - mu)
r = rank(c(x, data_y))
nx = length(data_x); ny = length(data_y)
W = sum(r[seq_along(x)]) - nx*(nx+1)/2
W
[1] 1032
For z-value:
c = -0.5 # Given two-sided and W > (nx*ny)/2 (1032>924.5)
t = table(r)
n = nx + ny
num = (W + c) - (nx*ny)/2
a = (nx*ny)/12; b = (n+1) - sum(t^3 - t)/(n*(n-1))
denom = sqrt(a*b)
z = num/denom
z
[1] 0.9246861
Two-tailed: For positive z-value (\(z^+\)), and negative z-value (\(z^-\)).
\(Pvalue = 2*P(Z>z^+)\) or \(Pvalue = 2*P(Z<z^-)\).
One-tailed: For right-tail, \(Pvalue = P(Z>z)\) or for left-tail, \(Pvalue = P(Z<z)\).
[1] 0.3551292
Same as:
Note that the p-value depends on the \(\text{test statistics}\) (\(z = 0.9246861\)). We also use the
distribution function pnorm()
for the normal distribution
in R.
[1] 0.3551292
[1] 0.3551292
One-tailed example:
The feedback form is a Google form but it does not collect any personal information.
Please click on the link below to go to the Google form.
Thank You!
Go to Feedback Form
Copyright © 2020 - 2024. All Rights Reserved by Stats Codes