Here, we discuss the Kolmogorov-Smirnov test in R, including its interpretation, comparing one sample to a distribution, and comparing two samples.
The Kolmogorov-Smirnov test in R can be performed with the
ks.test()
function from the base "stats" package.
The Kolmogorov-Smirnov test is a non-parametric test that can be used to test whether a sample fits a distribution, or if two samples are from the same distribution.
Function | Usage |
ks.test(Sample, CDF Function) |
Test if sample is from a distribution |
ks.test(Sample 1, Sample 2) |
Test if two samples are from the same distribution |
The Kolmogorov–Smirnov test statistic, \(D_n \in (0,1)\), for any cumulative distribution function \(F(x)\) is: \[D_n= \sup_x |F_n(x)-F(x)|,\] where \(F_n\) is the empirical cumulative distribution function of the sample data with \(n\) observations. This implies \(D_n\) is the maximum absolute difference between the two cumulative distribution functions for all \(x\) values.
Therefore, the higher the value of \(D_n\), the more different the distributions are, leading to a smaller p-value.
The one-sample Kolmogorov-Smirnov test null and alternate hypotheses are:
\(H_0\): The sample is from the distribution \(F(x)\).
\(H_1\): The sample is NOT from the distribution \(F(x)\).
With level of significance \(\alpha = 0.05\), test if a sample of \(30\) observations from the normal distribution with \(\tt{mean = 10}\) and \(\tt{sd = 2}\) is from the normal distribution with \(\tt{mean = 9}\) and \(\tt{sd = 1.5}\).
Exact one-sample Kolmogorov-Smirnov test
data: sample
D = 0.27658, p-value = 0.01608
alternative hypothesis: two-sided
The high \(D_n\), and \(\tt{p-value}\) below the level of significance \((\alpha = 0.05)\), are due to the sample being from a different distribution. Hence, given the \(\tt{p-value}\;(0.01608)\) is less than \(\alpha = 0.05\), we reject \(H_0\) that the sample is from \(X \sim N(9, 1.5)\).
See also the normal distribution tests.
With level of significance \(\alpha = 0.05\), test if a sample of \(50\) observations from the Student’s t-distribution with \(\tt{degree \; of \; freedom = 12}\) is from a Student’s t-distribution with \(\tt{degree \; of \; freedom = 12}\).
Exact one-sample Kolmogorov-Smirnov test
data: sample
D = 0.087856, p-value = 0.8029
alternative hypothesis: two-sided
The low \(D_n\), and \(\tt{p-value}\) above the level of significance \((\alpha = 0.05)\), are due to the sample being from the distribution. Hence, given the \(\tt{p-value}\;(0.8029)\) is greater than \(\alpha = 0.05\), we fail to reject \(H_0\) that the sample is from \(X \sim t_{12}\).
For testing whether two samples have different underlying probability distributions, the test statistic, \(D_{n,m} \in (0,1)\), is: \[D_{n,m}=\sup_x |F_{1,n}(x)-F_{2,m}(x)|,\]
where \(F_{1,n}\) and \(F_{2,m}\) are the empirical cumulative distribution functions of Sample 1 and Sample 2, with \(n\) and \(m\) observations, respectively.
Therefore, the higher the value of \(D_{n,m}\), the more different the distributions are leading to a smaller p-value.
The two-sample Kolmogorov-Smirnov test null and alternate hypotheses are:
\(H_0\): The two samples are from the same distribution.
\(H_1\): The two samples are NOT from the same distribution.
With level of significance \(\alpha = 0.05\), test if two samples, one of with \(40\) observations, and another with \(35\) observations, both from the exponential distribution with \(\tt{rate = 0.2}\) are from the same distribution.
Exact two-sample Kolmogorov-Smirnov test
data: sample1 and sample2
D = 0.15, p-value = 0.7319
alternative hypothesis: two-sided
The low \(D_n\), and \(\tt{p-value}\) above the level of significance \((\alpha = 0.05)\), are due to the samples being from the same distribution. Hence, given the \(\tt{p-value\;(0.7319)}\) is greater than \(\alpha = 0.05\), we fail to reject \(H_0\) that the two samples are from the same distribution.
With level of significance \(\alpha = 0.05\), test if a sample with \(50\) observations from the binomial distribution with \(\tt{size = 10}\) and \(\tt{prob = 0.8}\) is from the same distribution as another sample with \(50\) observations from the Poisson distribution with \(\tt{mean = 6}\).
Exact two-sample Kolmogorov-Smirnov test
data: sample1 and sample2
D = 0.48, p-value = 4.041e-06
alternative hypothesis: two-sided
The high \(D_n\), and \(\tt{p-value}\) below the level of significance \((\alpha = 0.05)\), are due to the samples being from different distributions. Hence, given the \(\tt{p-value\;(4.041e-06)}\) is less than \(\alpha = 0.05\), we reject \(H_0\) that the two samples are from the same distribution.
For the test of whether the CDF of one distribution lies above another, you can use the Kolmogorov-Smirnov test in R, with the argument "alternate" set to "less" (for Sample 1 below Sample 2) or "greater" (for Sample 1 above Sample 2).
The one-sided Kolmogorov-Smirnov test null and alternate hypotheses are:
\(H_0\): The CDF of Sample 1 and the CDF of Sample 2 are identical.
\(H_1\): The CDF of Sample 1 lies above (or below) the CDF of Sample 2.
With level of significance \(\alpha = 0.05\), test if the CDF of a sample with \(75\) observations from the uniform distribution with \(\tt{min = 0}\) and \(\tt{max = 1}\) lies above the CDF of another sample with \(80\) observations from the uniform distribution with \(\tt{min = 0.2}\) and \(\tt{max = 1.2}\).
sample1 = runif(75, 0, 1)
sample2 = runif(80, 0.2, 1.2)
ks.test(sample1, sample2, alternative = "greater")
Exact two-sample Kolmogorov-Smirnov test
data: sample1 and sample2
D^+ = 0.25583, p-value = 0.004928
alternative hypothesis: the CDF of x lies above that of y
The high \(D_n\), and \(\tt{p-value}\) below the level of significance \((\alpha = 0.05)\), are due to Sample 1 coming from a distribution with typically smaller observations than those of the distribution Sample 2 comes from, which will cause the CDF of Sample 1 to lie above the CDF of Sample 2. Hence, given the \(\tt{p-value\;(0.004928)}\) is less than \(\alpha = 0.05\), we reject \(H_0\) in favor of \(H_1\), that the CDF of Sample 1 lies above the CDF of Sample 2.
The feedback form is a Google form but it does not collect any personal information.
Please click on the link below to go to the Google form.
Thank You!
Go to Feedback Form
Copyright © 2020 - 2024. All Rights Reserved by Stats Codes