1 Test Statistic for Wilcoxon Rank-Sum Test in R
2 Simple Wilcoxon Rank-Sum Test in R
3 Two-tailed Wilcoxon Rank-Sum Test in R
4 One-tailed Wilcoxon Rank-Sum Test in R
5 Wilcoxon Rank-Sum Test: Test Statistics & P-values in R

Here, we discuss the Wilcoxon rank-sum test in R with interpretations, including, test statistics, p-values, and confidence intervals.

The Wilcoxon rank-sum (or Mann-Whitney U) test in R can be performed with the wilcox.test() function from the base "stats" package.

The Wilcoxon rank-sum test, with the assumption that the distributions have similar shapes or are symmetric, can be used to test whether the difference between the medians of the two populations where two independent samples come from is equal to a certain value (which is stated in the null hypothesis) or not. It is a non-parametric alternative to the two independent samples t-test with equal variance assumption.

In the Wilcoxon rank-sum test, the test statistic is based on the sum of ranks. It is the sum of the ranks of the first sample’s values minus the null hypothesis difference between medians, where the values considered in the rankings include, the first sample’s values minus the null hypothesis difference between medians, and the second sample’s values.

Wilcoxon Rank-Sum Tests & Hypotheses
**With the assumption that the distributions have similar shapes or are symmetric.**
Question	Are the medians equal, or difference equal to $m_0$?	Is median x greater than median y, or difference greater than $m_0$?	Is median x less than median y, or difference less than $m_0$?
Form of Test	Two-tailed	Right-tailed test	Left-tailed test
Null Hypothesis, $H_0$	$m_x = m_y$; $\quad$ $m_x - m_y = m_0$	$m_x = m_y$; $\quad$ $m_x - m_y = m_0$	$m_x = m_y$; $\quad$ $m_x - m_y = m_0$
Alternate Hypothesis, $H_1$	$m_x \neq m_y$; $\quad$ $m_x - m_y \neq m_0$	$m_x > m_y$; $\quad$ $m_x - m_y > m_0$	$m_x < m_y$; $\quad$ $m_x - m_y < m_0$

Sample Steps to Run a Wilcoxon Rank-Sum Test:

# Create the data samples for the Wilcoxon rank-sum test

data_x = c(4.8, 4.2, 4.3, 3.0, 3.9)
data_y = c(4.6, 3.6, 5.0, 5.6,
           3.5, 5.1, 4.7, 4.4)

# Run the Wilcoxon rank-sum test with specifications

wilcox.test(data_x, data_y,
            mu = 0, alternative = "two.sided",
            conf.int = TRUE, conf.level = 0.95)


    Wilcoxon rank sum exact test

data:  data_x and data_y
W = 11, p-value = 0.2222
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -1.4  0.4
sample estimates:
difference in location 
                  -0.5

Table of Some Wilcoxon Rank-Sum Test Arguments in R
Argument	Usage
x, y	x is the first sample data values, y is the second sample data values
mu	Population difference between the medians in null hypothesis
alternative	Set alternate hypothesis as "greater", "less", or the default "two.sided"
exact	For n_x<50 and n_y<50, and no rank ties: Set to `FALSE` to compute p-value based on normal distribution, (default = `TRUE`)
correct	For cases with non-exact p-values: Set to `FALSE` to remove continuity correction, (default = `TRUE`)
conf.int	Set to `TRUE` to include the confidence interval, (default = `FALSE`)
conf.level	Level of confidence for the test and confidence interval, (default = 0.95)

Creating a Wilcoxon Rank-Sum Test Object:

# Create data
data_x = rnorm(35); data_y = rnorm(35)

# Create object
wrst_object = wilcox.test(data_x, data_y,
                          mu = 0, alternative = "two.sided",
                          conf.int = TRUE, conf.level = 0.95)

# Extract a component
wrst_object$statistic

  W 
597

Table of Some Wilcoxon Rank-Sum Test Object Outputs in R
Test Component	Usage
wrst_object$statistic	Test-statistic value
wrst_object$p.value	P-value
wrst_object$estimate	Point estimate of population difference between medians when `conf.int = TRUE`
wrst_object$conf.int	Confidence interval when `conf.int = TRUE`

1 Test Statistic for Wilcoxon Rank-Sum Test in R

With $\text{rank}(1, 3, 3, 5, 7, 7, 7) = (1, 2.5, 2.5, 4, 6, 6, 6)$.

Let $x_i's$ ($1\leq i \leq n_x$), and $y_j's$ ($1\leq j \leq n_y$) be the sample values,

$n_x$ is the number of $x$ observations, and $n_y$ is the number of $y$ observations.

$R_i$ is the rank of $x_i - m_0$, among all $x_i - m_0$ and all $y_j$ values, hence, $n_x + n_y$ values,

$m_0$ is the population difference between the medians to be tested and set in the null hypothesis, and

the total number of observations is $n$, with $n = n_x + n_y$.

The Wilcoxon rank-sum test has test statistics, $W$, of the form:

\[W = \sum_{i=1}^{n_x} R_i - \frac{n_x(n_x+1)}{2}.\]

For dependent samples, see the Wilcoxon signed rank test for paired samples and the sign test for paired samples.

For extension to three or more groups, see the Kruskal-Wallis test.

Large Samples:

For large samples ($n_x\geq50$ or $n_y\geq50$), or cases with rank ties among the $x_i - m_0$ and $y_j$ values:

With $T$ as the number of sets of unique ranks, and $t_k$ as the number of tied values for set $k$ that are tied at a particular value, inference on $W$ and the test outcome is based on normal distribution approximation by standardizing $W$.

With $\frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)}=0$ if there are no ties (all $t_k = 1$),

\[z = \frac{W - \frac{n_x n_y}{2}}{\sqrt{\frac{n_x n_y}{12}\left((n + 1) - \frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)} \right)}}.\]

Applying continuity correction for the normal distribution approximation (the default in R),

\[z = \frac{(W+c) - \frac{n_x n_y}{2}}{\sqrt{\frac{n_x n_y}{12}\left((n + 1) - \frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)} \right)}}.\]

For two-sided test, $c=0.5$ if $W<\frac{n_x n_y}{2}$, $c=-0.5$ if $W>\frac{n_x n_y}{2}$, and $c=0$ if $W=\frac{n_x n_y}{2}$. For one-sided test, $c=-0.5$ if the alternative is "greater", and $c=0.5$ if it is "less".

Small Samples with No Rank Ties:

For small samples ($n_x < 50$ and $n_y < 50$) with no rank ties:

The p-value is based on the exact distribution of the Wilcoxon rank sum statistic $W$, with $\text{sizes}$, $n_x$ and $n_y$.

2 Simple Wilcoxon Rank-Sum Test in R

Enter the data by hand.

data_x = c(9.47, 9.00, 11.20, 10.28, 12.60, 7.75,
           9.73, 9.58, 7.85, 12.00, 9.79, 7.23)
data_y = c(8.82, 7.79, 12.08, 15.09, 8.65, 9.82,
           10.56, 15.42, 5.34, 8.97, 7.59, 9.54)

For the following null hypothesis $H_0$, and alternative hypothesis $H_1$, with the level of significance $\alpha=0.05$.

$H_0:$ the difference between the population medians is equal to 0 ($m_x - m_y = 0$).

$H_1:$ the difference between the population medians is not equal to 0 ($m_x - m_y \neq 0$, hence the default two-sided).

Because the level of significance is $\alpha=0.05$, the level of confidence is $1 - \alpha = 0.95$.

The wilcox.test() function has the default alternative as "two.sided", the default difference between medians as 0, and the default level of confidence as 0.95, hence, you do not need to specify the "alternative", "mu", and "conf.level" arguments in this case.

wilcox.test(data_x, data_y,
            alternative = "two.sided",
            mu = 0, 
            conf.int = TRUE, conf.level = 0.95)

Or:

wilcox.test(data_x, data_y,
            conf.int = TRUE)


    Wilcoxon rank sum exact test

data:  data_x and data_y
W = 75, p-value = 0.8874
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -2.07  1.94
sample estimates:
difference in location 
                  0.11

The estimate of the difference between the medians, $\tilde d$, is 0.11,

test statistic, $W$, is 75,

the p-value, $p$, is 0.8874,

the 95% confidence interval is [-2.07, 1.94].

Interpretation:

Note that for wilcox.test() in R, the two methods may disagree for some edge cases, as p-value is based on exact distribution or normal distribution, and confidence interval is sometimes based on approximations.

P-value: With the p-value ($p = 0.8874$) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the difference between the population medians is equal to 0.
Confidence Interval: With the null hypothesis difference between the medians ($m_x - m_y = 0$) being inside the confidence interval, $[-2.07, 1.94]$, we fail to reject the null hypothesis that the difference between the population medians is equal to 0.

3 Two-tailed Wilcoxon Rank-Sum Test in R

Using the USJudgeRatings data from the "datasets" package with 10 sample rows from 43 rows below:

Judge_data = USJudgeRatings[,1:6]
colnames(Judge_data) = c("CONT", "INTG", "DMNR", "DILG", "CFMG", "DECI")
rownames(Judge_data) = NULL
Judge_data

   CONT INTG DMNR DILG CFMG DECI
1   5.7  7.9  7.7  7.3  7.1  7.4
13  6.7  8.6  8.2  6.8  6.9  6.6
16  7.3  8.0  7.4  7.7  7.3  7.3
18  7.7  7.7  6.7  7.5  7.4  7.5
21  7.1  8.2  7.7  7.1  6.6  6.6
24  6.2  8.3  8.1  7.7  7.4  7.3
25  7.5  8.7  8.5  8.6  8.5  8.4
31  6.6  7.4  6.9  8.4  8.0  7.9
36  8.5  8.3  8.1  8.3  8.4  8.2
43  8.6  7.4  7.0  7.5  7.5  7.7

For “INTG” as the x group versus “CONT” as the y group.

For the following null hypothesis $H_0$, and alternative hypothesis $H_1$, with the level of significance $\alpha=0.1$, without continuity correction.

$H_0:$ population median of $x$ minus population median of $y$ is equal to 0.3 ($m_x - m_y = 0.3$).

$H_1:$ population median of $x$ minus population median of $y$ is not equal to 0.3 ($m_x - m_y \neq 0.3$, hence the default two-sided).

Because the level of significance is $\alpha=0.1$, the level of confidence is $1 - \alpha = 0.9$.

wilcox.test(Judge_data$INTG, Judge_data$CONT,
            alternative = "two.sided", mu = 0.3,
            correct = FALSE,
            conf.int = TRUE, conf.level = 0.9)

Warning in wilcox.test.default(Judge_data$INTG, Judge_data$CONT, alternative =
"two.sided", : cannot compute exact p-value with ties

Warning in wilcox.test.default(Judge_data$INTG, Judge_data$CONT, alternative =
"two.sided", : cannot compute exact confidence intervals with ties


    Wilcoxon rank sum test

data:  Judge_data$INTG and Judge_data$CONT
W = 1200, p-value = 0.01725
alternative hypothesis: true location shift is not equal to 0.3
90 percent confidence interval:
 0.4000099 1.0000483
sample estimates:
difference in location 
             0.7000586

The warnings are because there are ties in the data. Hence, p-value is based on normal approximation not exact distribution.

Interpretation:

P-value: With the p-value ($p = 0.01725$) being less than the level of significance 0.1, we reject the null hypothesis that population median of $x$ minus population median of $y$ is equal to 0.3.
Confidence Interval: With the null hypothesis difference between the medians ($m_x - m_y = 0.3$) being outside the confidence interval, $[0.4000052, 1.0000484]$, we reject the null hypothesis that population median of $x$ minus population median of $y$ is equal to 0.3.

4 One-tailed Wilcoxon Rank-Sum Test in R

Right Tailed Test

Using the USJudgeRatings data from the "datasets" package with 10 sample rows from 43 rows below:

Judge_data = USJudgeRatings[,7:12]
colnames(Judge_data) = c("PREP", "FAMI", "ORAL", "WRIT", "PHYS", "RTEN")
rownames(Judge_data) = NULL
Judge_data

   PREP FAMI ORAL WRIT PHYS RTEN
1   7.1  7.1  7.1  7.0  8.3  7.8
8   4.8  5.1  4.7  4.9  6.8  5.0
16  7.3  7.2  7.1  7.2  8.0  7.6
24  7.3  7.3  7.2  7.3  7.8  7.6
27  7.8  7.8  7.8  7.7  8.3  8.2
28  8.4  8.3  8.3  8.3  8.8  8.7
30  9.1  9.1  8.9  9.0  8.9  9.2
33  8.4  8.5  8.1  8.3  8.7  8.3
42  7.8  8.2  8.0  8.1  8.3  8.1
43  7.4  7.2  6.9  7.0  7.8  7.1

For “PHYS” as the x group versus “ORAL” as the y group.

For the following null hypothesis $H_0$, and alternative hypothesis $H_1$, with the level of significance $\alpha=0.1$.

$H_0:$ population median of $x$ minus population median of $y$ is equal to 0.4 ($m_x - m_y = 0.4$).

$H_1:$ population median of $x$ minus population median of $y$ is greater than 0.4 ($m_x - m_y > 0.4$, hence one-sided).

Because the level of significance is $\alpha=0.1$, the level of confidence is $1 - \alpha = 0.9$.

wilcox.test(Judge_data$PHYS, Judge_data$ORAL,
            alternative = "greater", mu = 0.4,
            conf.int = TRUE, conf.level = 0.9)

Warning in wilcox.test.default(Judge_data$PHYS, Judge_data$ORAL, alternative =
"greater", : cannot compute exact p-value with ties

Warning in wilcox.test.default(Judge_data$PHYS, Judge_data$ORAL, alternative =
"greater", : cannot compute exact confidence intervals with ties


    Wilcoxon rank sum test with continuity correction

data:  Judge_data$PHYS and Judge_data$ORAL
W = 1079, p-value = 0.09149
alternative hypothesis: true location shift is greater than 0.4
90 percent confidence interval:
 0.4000048       Inf
sample estimates:
difference in location 
             0.6999724

Interpretation:

P-value: With the p-value ($p = 0.09149$) being less than the level of significance 0.1, we reject the null hypothesis that the population median of $x$ minus population median of $y$ is equal to 0.4.
Confidence Interval: With the null hypothesis difference between the medians ($m_x - m_y = 0.4$) being outside the confidence interval, $[0.4000048, \infty)$, we reject the null hypothesis that population median of $x$ minus population median of $y$ is equal to 0.4.

Left Tailed Test

For “PREP” as the x group versus “FAMI” as the y group.

For the following null hypothesis $H_0$, and alternative hypothesis $H_1$, with the level of significance $\alpha=0.05$.

$H_0:$ population median of $x$ and the population median of $y$ are equal ($m_x = m_y$).

$H_1:$ population median of $x$ is less than population median of $y$ ($m_x < m_y$, hence one-sided).

Because the level of significance is $\alpha=0.05$, the level of confidence is $1 - \alpha = 0.95$.

wilcox.test(Judge_data$PREP, Judge_data$FAMI,
            alternative = "less", mu = 0,
            conf.int = TRUE, conf.level = 0.95)

Warning in wilcox.test.default(Judge_data$PREP, Judge_data$FAMI, alternative =
"less", : cannot compute exact p-value with ties

Warning in wilcox.test.default(Judge_data$PREP, Judge_data$FAMI, alternative =
"less", : cannot compute exact confidence intervals with ties


    Wilcoxon rank sum test with continuity correction

data:  Judge_data$PREP and Judge_data$FAMI
W = 911, p-value = 0.4553
alternative hypothesis: true location shift is less than 0
95 percent confidence interval:
      -Inf 0.3000546
sample estimates:
difference in location 
          -3.72148e-05

Interpretation:

P-value: With the p-value ($p = 0.4553$) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the population median of $x$ and the population median of $y$ are equal.
Confidence Interval: With the null hypothesis difference between the medians ($m_x - m_y = 0$) being inside the confidence interval, $(-\infty, 0.3000546]$, we fail to reject the null hypothesis that the population median of $x$ and the population median of $y$ are equal.

5 Wilcoxon Rank-Sum Test: Test Statistics & P-values in R

Here for a Wilcoxon rank-sum test, we show how to get the test statistics (and z-value), and p-values from the wilcox.test() function in R, or by written code.

data_x = Judge_data$RTEN; data_y = Judge_data$WRIT
wrst_object = wilcox.test(data_x, data_y,
                          alternative = "two.sided", mu = 0.1)

Warning in wilcox.test.default(data_x, data_y, alternative = "two.sided", :
cannot compute exact p-value with ties

wrst_object


    Wilcoxon rank sum test with continuity correction

data:  data_x and data_y
W = 1032, p-value = 0.3551
alternative hypothesis: true location shift is not equal to 0.1

To get the test statistic and z-value:

\[W = \sum_{i=1}^{n_x} R_i - \frac{n_x(n_x+1)}{2}.\]

With continuity correction:

\[z = \frac{(W+c) - \frac{n_x n_y}{2}}{\sqrt{\frac{n_x n_y}{12}\left((n + 1) - \frac{\sum_{k=1}^{T}(t_k^3-t_k)}{n(n-1)} \right)}}.\]

wrst_object$statistic

   W 
1032

# to remove name W
unname(wrst_object$statistic)

[1] 1032

Same as:

mu = 0.1; x = (data_x - mu)
r = rank(c(x, data_y))
nx = length(data_x); ny = length(data_y)
W = sum(r[seq_along(x)]) - nx*(nx+1)/2
W

[1] 1032

For z-value:

c = -0.5 # Given two-sided and W > (nx*ny)/2 (1032>924.5)
t = table(r)
n = nx + ny
num = (W + c) - (nx*ny)/2
a = (nx*ny)/12; b = (n+1) - sum(t^3 - t)/(n*(n-1))
denom = sqrt(a*b)
z = num/denom
z

[1] 0.9246861

To get the p-value for normal approximation:

Two-tailed: For positive z-value ($z^+$), and negative z-value ($z^-$).

$Pvalue = 2*P(Z>z^+)$ or $Pvalue = 2*P(Z<z^-)$.

One-tailed: For right-tail, $Pvalue = P(Z>z)$ or for left-tail, $Pvalue = P(Z<z)$.

wrst_object$p.value

[1] 0.3551292

Same as:

Note that the p-value depends on the $\text{test statistics}$ ($z = 0.9246861$). We also use the distribution function pnorm() for the normal distribution in R.

2*(1-pnorm(0.9246861)); 2*pnorm(-0.9246861)

[1] 0.3551292

[1] 0.3551292

One-tailed example:

# Right tailed
1-pnorm(0.9246861)
# Left tailed
pnorm(-0.9246861)

Question	Are the medians equal, or difference equal to \(m_0\)?	Is median x greater than median y, or difference greater than \(m_0\)?	Is median x less than median y, or difference less than \(m_0\)?
Form of Test	Two-tailed	Right-tailed test	Left-tailed test
Null Hypothesis, \(H_0\)	\(m_x = m_y\); \(\quad\) \(m_x - m_y = m_0\)	\(m_x = m_y\); \(\quad\) \(m_x - m_y = m_0\)	\(m_x = m_y\); \(\quad\) \(m_x - m_y = m_0\)
Alternate Hypothesis, \(H_1\)	\(m_x \neq m_y\); \(\quad\) \(m_x - m_y \neq m_0\)	\(m_x > m_y\); \(\quad\) \(m_x - m_y > m_0\)	\(m_x < m_y\); \(\quad\) \(m_x - m_y < m_0\)

Wilcoxon Rank-Sum (Mann–Whitney U) Tests in R

Sample Steps to Run a Wilcoxon Rank-Sum Test:

Creating a Wilcoxon Rank-Sum Test Object:

1 Test Statistic for Wilcoxon Rank-Sum Test in R

Large Samples:

Small Samples with No Rank Ties:

2 Simple Wilcoxon Rank-Sum Test in R

Interpretation:

3 Two-tailed Wilcoxon Rank-Sum Test in R

Interpretation:

4 One-tailed Wilcoxon Rank-Sum Test in R

Right Tailed Test

Interpretation:

Left Tailed Test

Interpretation:

5 Wilcoxon Rank-Sum Test: Test Statistics & P-values in R

To get the test statistic and z-value:

To get the p-value for normal approximation: