Here, we discuss the one sample Wilcoxon signed-rank test in R with interpretations, including, test statistics, p-values, and confidence intervals.
The one sample Wilcoxon signed-rank test in R can be performed with
the wilcox.test()
function from the base "stats" package.
The one sample Wilcoxon signed-rank test, with a symmetric distribution assumption, can be used to test whether the median of the population an independent random sample comes from is equal to a certain value (which is stated in the null hypothesis) or not. It is a non-parametric alternative to the one sample t-test.
In the one sample Wilcoxon signed-rank test, the test statistic is based on the sign of ranks. It is the sum of the ranks of observed values that are greater than the null hypothesis median, where the ranks are based on of the distances of each of the observed values to the null hypothesis median.
Question | Is the median equal to \(m_0\)? | Is the median greater than \(m_0\)? | Is the median less than \(m_0\)? |
Form of Test | Two-tailed | Right-tailed test | Left-tailed test |
Null Hypothesis, \(H_0\) | \(m = m_0\) | \(m = m_0\) | \(m = m_0\) |
Alternate Hypothesis, \(H_1\) | \(m \neq m_0\) | \(m > m_0\) | \(m < m_0\) |
# Create the data for the one sample Wilcoxon signed-rank test
data = c(0.2, 0.1, -1.0, -0.4, 1.4, 0.5)
# Run the one sample Wilcoxon signed-rank test with specifications
wilcox.test(data, alternative = "two.sided",
mu = 0,
conf.int = TRUE, conf.level = 0.95)
Wilcoxon signed rank exact test
data: data
V = 13, p-value = 0.6875
alternative hypothesis: true location is not equal to 0
95 percent confidence interval:
-1.0 1.4
sample estimates:
(pseudo)median
0.15
Argument | Usage |
x | Sample data values |
mu | Population median value in null hypothesis |
alternative | Set alternate hypothesis as "greater", "less", or the default "two.sided" |
exact | For n<50 and no zeroes and no rank ties: Set to FALSE to compute p-value based on normal distribution,
(default = TRUE ) |
correct | For cases with non-exact p-values: Set to FALSE to
remove continuity correction, (default = TRUE ) |
conf.int | Set to TRUE to include the confidence interval,
(default = FALSE ) |
conf.level | Level of confidence for the test and confidence interval, (default = 0.95) |
# Create data
data = rnorm(30)
# Create object
wsrt_object = wilcox.test(data, alternative = "two.sided",
mu = 0,
conf.int = TRUE, conf.level = 0.95)
# Extract a component
wsrt_object$statistic
V
213
Test Component | Usage |
wsrt_object$statistic | Test-statistic value |
wsrt_object$p.value | P-value |
wsrt_object$estimate | Point estimate of median when conf.int = TRUE |
wsrt_object$conf.int | Confidence interval when conf.int = TRUE |
With \(\text{rank}(1, 2, 2, 3, 4, 4, 4) = (1, 2.5, 2.5, 4, 6, 6, 6)\).
Let \(x_i's\) be the sample values,
\(m_0\) is the population median value to be tested and set in the null hypothesis,
\(R_i\) is the rank of \(|x_i - m_0|\), among all absolute differences (distances),
\(I_{(x_i-m_0)>0}\) is \(1\) when \((x_i-m_0)>0\) and \(0\) otherwise, and
\(N\) is the sample size.
The one sample Wilcoxon signed-rank test has test statistics, \(V\), of the form:
\[V = \sum_{i=1}^N R_i \cdot I_{(x_i-m_0)>0}.\]
See also the sign test for one sample and the Wilcoxon signed rank test for paired samples.
For large samples (\(N\geq50\)), or cases with rank ties or at least one \((x_i - m_0) = 0\):
With \(n\) as the number of non-zero \((x_i - m_0)\), \(T\) as the number of sets of unique ranks, and \(t_k\) as the number of tied values for set \(k\) that are tied at a particular value, inference on \(V\) is based on normal distribution approximation.
With \(\frac{\sum_{k=1}^{T}(t_k^3-t_k)}{48}=0\) if there are no ties (all \(t_k =1\)),
\[z = \frac{V - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n + 1)(2n + 1)}{24}-\frac{\sum_{k=1}^{T}(t_k^3-t_k)}{48}}}.\] With continuity correction (the default in R) for the normal distribution approximation,
\[z = \frac{(V + c) - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n + 1)(2n + 1)}{24}-\frac{\sum_{k=1}^{T}(t_k^3-t_k)}{48}}}.\] For two-sided test, \(c=-0.5\) when \(V>\frac{n(n+1)}{4}\), \(c=0.5\) when \(V<\frac{n(n+1)}{4}\), and \(c=0\) when \(V=\frac{n(n+1)}{4}\). For one-sided test, when the alternative is "greater", \(c=-0.5\), when it is "less", \(c=0.5\).
For small samples sizes (\(N<50\)) with no rank ties or any \((x_i - m_0)=0\):
The p-value is based on the exact distribution of the Wilcoxon signed rank statistic \(V\), with \(\text{size} = N\).
Enter the data by hand.
data = c(0.78, -0.08, 0.25, -0.03, -0.04, 1.37,
-0.23, 1.52, -1.55, 0.58, 0.12, 0.22,
0.38, -0.50, -0.33, -1.02, -1.07, 0.30)
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).
\(H_0:\) the population median is equal to 0 (\(m = 0\)).
\(H_1:\) the population median is not equal to 0 (\(m \neq 0\), hence the default two-sided).
Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).
The wilcox.test()
function has the default
alternative as "two.sided", the default median as
0, and the default level of confidence as
0.95, hence, you do not need to specify the "alternative",
"mu", and "conf.level" arguments in this case.
Or:
Wilcoxon signed rank exact test
data: data
V = 92, p-value = 0.7987
alternative hypothesis: true location is not equal to 0
95 percent confidence interval:
-0.365 0.400
sample estimates:
(pseudo)median
0.045
The estimate of the median, \(\tilde x\), is 0.045,
test statistic, \(V\), is 92,
the p-value, \(p\), is 0.7987,
the 95% confidence interval is [-0.365, 0.400].
Note that for wilcox.test()
in R, the two methods may
disagree for some edge cases, as p-value is based on exact distribution
or normal distribution, and confidence interval is sometimes based on
approximations.
P-value: With the p-value (\(p = 0.7987\)) being greater than the level of significance 0.05, we fail to reject the null hypothesis that the population median is equal to 0.
Confidence Interval: With the null hypothesis median value (\(m = 0\)) being inside the confidence interval, \([-0.365, 0.400]\), we fail to reject the null hypothesis that the population median is equal to 0.
Using the sleep$extra data from the "datasets" package with 10 sample observations from 20 observations below:
[1] 0.7 -0.1 3.4 3.7 2.0 1.1 4.4 5.5 1.6 3.4
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\), without continuity correction.
\(H_0:\) the population median is equal to 0.5 (\(m = 0.5\)).
\(H_1:\) the population median is not equal to 0.5 (\(m \neq 0.5\), hence the default two-sided).
Because the level of significance is \(\alpha=0.1\), the level of confidence is \(1 - \alpha = 0.9\).
wilcox.test(sleep$extra, alternative = "two.sided",
mu = 0.5, correct = FALSE,
conf.int = TRUE, conf.level = 0.9)
Warning in wilcox.test.default(sleep$extra, alternative = "two.sided", mu =
0.5, : cannot compute exact p-value with ties
Warning in wilcox.test.default(sleep$extra, alternative = "two.sided", mu =
0.5, : cannot compute exact confidence interval with ties
Wilcoxon signed rank test
data: sleep$extra
V = 152, p-value = 0.07924
alternative hypothesis: true location is not equal to 0.5
90 percent confidence interval:
0.5500271 2.2500210
sample estimates:
(pseudo)median
1.49993
The warnings are because there are ties in the data. Hence, p-value is based on normal approximation not exact distribution.
P-value: With the p-value (\(p = 0.07924\)) being less than the level of significance 0.1, we reject the null hypothesis that the population median is equal to 0.5.
Confidence Interval: With the null hypothesis median value (\(m = 0.5\)) being outside the confidence interval, \([0.5500271, 2.2500210]\), we reject the null hypothesis that the population median is equal to 0.5.
Using the trees$Girth data from the "datasets" package with 10 sample observations from 31 observations below:
[1] 8.3 10.5 11.3 11.4 12.0 12.9 13.7 13.8 14.5 20.6
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.1\).
\(H_0:\) the population median is equal to 12.5 (\(m = 12.5\)).
\(H_1:\) the population median is greater than 12.5 (\(m > 12.5\), hence one-sided).
Because the level of significance is \(\alpha=0.1\), the level of confidence is \(1 - \alpha = 0.9\).
Warning in wilcox.test.default(trees$Girth, alternative = "greater", mu = 12.5,
: cannot compute exact p-value with ties
Warning in wilcox.test.default(trees$Girth, alternative = "greater", mu = 12.5,
: cannot compute exact confidence interval with ties
Wilcoxon signed rank test with continuity correction
data: trees$Girth
V = 291.5, p-value = 0.1996
alternative hypothesis: true location is greater than 12.5
90 percent confidence interval:
12.30006 Inf
sample estimates:
(pseudo)median
12.99677
P-value: With the p-value (\(p = 0.1996\)) being greater than the level of significance 0.1, we fail to reject the null hypothesis that the population median is equal to 12.5.
Confidence Interval: With the null hypothesis median value (\(m = 12.5\)) being inside the confidence interval, \([12.30006, \infty)\), we fail to reject the null hypothesis that the population median is equal to 12.5.
For the following null hypothesis \(H_0\), and alternative hypothesis \(H_1\), with the level of significance \(\alpha=0.05\).
\(H_0:\) the population median is equal to 16 (\(m = 16\)).
\(H_1:\) the population median is less than 16 (\(m < 16\), hence one-sided).
Because the level of significance is \(\alpha=0.05\), the level of confidence is \(1 - \alpha = 0.95\).
Warning in wilcox.test.default(trees$Girth, alternative = "less", mu = 16, :
cannot compute exact p-value with ties
Warning in wilcox.test.default(trees$Girth, alternative = "less", mu = 16, :
cannot compute exact confidence interval with ties
Warning in wilcox.test.default(trees$Girth, alternative = "less", mu = 16, :
cannot compute exact p-value with zeroes
Warning in wilcox.test.default(trees$Girth, alternative = "less", mu = 16, :
cannot compute exact confidence interval with zeroes
Wilcoxon signed rank test with continuity correction
data: trees$Girth
V = 47.5, p-value = 7.363e-05
alternative hypothesis: true location is less than 16
95 percent confidence interval:
-Inf 14.20002
sample estimates:
(pseudo)median
12.84995
The warnings are because there are ties in the data, and \(x_i - m_0 = 0\) for at least one observation. Hence, p-value is based on normal approximation not exact distribution.
P-value: With the p-value (\(p = 7.363e-05\)) being less than the level of significance 0.05, we reject the null hypothesis that the population median is equal to 16.
Confidence Interval: With the null hypothesis median value (\(m = 16\)) being outside the confidence interval, \((-\infty, 14.20002]\), we reject the null hypothesis that the population median is equal to 16.
Here for a one sample Wilcoxon signed-rank test, we show how to get
the test statistics (and z-value), and p-values from the
wilcox.test()
function in R, or by written code.
data_os = trees$Girth
wsrt_object = wilcox.test(data_os,
alternative = "two.sided",
correct = TRUE,
mu = 15)
Warning in wilcox.test.default(data_os, alternative = "two.sided", correct =
TRUE, : cannot compute exact p-value with ties
Wilcoxon signed rank test with continuity correction
data: data_os
V = 104, p-value = 0.004913
alternative hypothesis: true location is not equal to 15
\[V = \sum_{i=1}^N R_i \cdot I_{(x_i-m_0)>0}.\]
With continuity correction:
\[z = \frac{(V + c) - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n + 1)(2n + 1)}{24}-\frac{\sum_{k=1}^{T}(t_k^3-t_k)}{48}}}.\]
For two-sided test, \(c=-0.5\) when \(V>\frac{n(n+1)}{4}\), \(c=0.5\) when \(V<\frac{n(n+1)}{4}\), and \(c=0\) when \(V=\frac{n(n+1)}{4}\). For one-sided test, when alternative is "greater", \(c=-0.5\), when alternative is "less", \(c=0.5\).
V
104
[1] 104
Same as:
[1] 104
For z-value:
c = 0.5 # Given two-sided and V < n*(n + 1)/4 (104<248)
t = table(r)
n = length(diffs[diffs!=0])
num = (V + c) - n*(n + 1)/4
denom = sqrt(n*(n+1)*(2*n+1)/24 - sum(t^3 - t)/48)
z = num/denom
z
[1] -2.812712
Two-tailed: For positive z-value (\(z^+\)), and negative z-value (\(z^-\)).
\(Pvalue = 2*P(Z>z^+)\) or \(Pvalue = 2*P(Z<z^-)\).
One-tailed: For right-tail, \(Pvalue = P(Z>z)\) or for left-tail, \(Pvalue = P(Z<z)\).
[1] 0.004912565
Same as:
Note that the p-value depends on the \(\text{test statistics}\) (\(z = -2.812712\)). We also use the
distribution function pnorm()
for the normal distribution
in R.
[1] 0.004912563
[1] 0.004912563
One-tailed example:
The feedback form is a Google form but it does not collect any personal information.
Please click on the link below to go to the Google form.
Thank You!
Go to Feedback Form
Copyright © 2020 - 2024. All Rights Reserved by Stats Codes