Here, we discuss the normal distribution tests in R, including via histogram, quantile-quantile plot, Shapiro-Wilk test, and Kolmogorov-Smirnov test.
One of the ways in R to test whether a sample data is from the normal distribution is through visualizations using histograms, and quantile-quantile (Q-Q) plots. Test for normality can also be done using two well-known non-parametric tests, the Shapiro-Wilk test and the Kolmogorov-Smirnov test.
All the functions below are from the base "stats" package, except the
hist()
function, which is from the base "graphics"
package.
Function | Usage |
hist(sample) |
Test based on histogram |
qqnorm(sample) |
Test based on normal quantile-quantile plot |
shapiro.test(sample) |
Test based on the Shapiro-Wilk test |
ks.test(sample, pnorm, mean, sd) |
Test based on the Kolmogorov-Smirnov test |
Here we test for normality using histograms, density histograms, and normal distribution density plots.
For example, for a sample of \(600\) observations from the normal distribution with \(\tt{mean = 12}\) and \(\tt{sd = 3}\), we can assess if it is from a normal distribution.
We can plot the histogram, and a density histogram overlaid with both the sample density and normal density. The more bell-shaped the histogram appears to be, the more likely the sample is from a normal distribution. Similarly, the more the sample density line and the normal distribution density line overlap, the more likely the sample is from a normal distribution.
Based on the appearances below, as expected, we can conclude the sample is from a normal distribution.
# Histogram Code
hist(hs_sample,
main = "Histogram of Sample",
xlab = "Observations",
ylab = "Frequency",
xlim = c(0, 24),
ylim = c(0, 200))
# Density Histogram
hist(hs_sample, freq = FALSE,
main = "Density Histogram of Sample",
xlab = "Observations",
ylab = "Density",
xlim = c(0, 24),
ylim = c(0, 0.2))
# Sample Density
den = density(hs_sample)
lines(den, col = "blue", lwd = 1.5)
# Normal Density
x = seq(0, 24, by = 1/1000)
y = dnorm(x, mean(hs_sample), sd(hs_sample))
points(x, y, type = 'l', col = "green", lwd = 2)
legend("topright", c("Sample", "Normal"),
fill = c("blue", "green"))
For example, for a sample of \(600\) observations from the exponential distribution with \(\tt{rate = 1/12}\) or \(\tt{mean = 12}\), we can assess if it is from a normal distribution.
Based on the appearances below, as expected, we can conclude the sample is NOT from a normal distribution.
Here we test for normality using quantile-quantile plots.
For example, for a sample of \(200\) observations from the normal distribution with \(\tt{mean = 3}\) and \(\tt{sd = 1}\), we can assess if it is from a normal distribution.
In the Q-Q plot, the more the dots align to a straight line, the more likely the sample is from a normal distribution.
Based on the appearances below, as expected, we can conclude the sample is from a normal distribution.
# Simple Q-Q Plot Code
qqnorm(qq_sample)
# Q-Q Plot Code with Line Overlaid
qqnorm(qq_sample,
main = "Sample Normal Q-Q Plot",
xlab = "Quantiles from Normal Distribution",
ylab = "Quantiles from Sample")
qqline(qq_sample, col = "red", lwd = "2")
For example, for a sample of \(100\) observations from the exponential distribution with \(\tt{rate = 1/3}\) or \(\tt{mean = 3}\), we can assess if it is from a normal distribution.
Based on the appearances below, as expected, we can conclude the sample is NOT from a normal distribution.
The normal distribution test null and alternate hypotheses are:
\(H_0\): The sample is from a normal distribution.
\(H_1\): The sample is NOT from a normal distribution.
The test-statistic is \(W\), the lower the value of \(W\), the more the distribution is likely to be different from a normal distribution, leading to a smaller p-value.
The sample of interest here is a sample of \(60\) observations from the normal distribution with \(\tt{mean = 4}\) and \(\tt{sd = 1.5}\).
With level of significance \(\alpha = 0.05\), test if the sample is from a normal distribution.
Shapiro-Wilk normality test
data: sw_sample
W = 0.99291, p-value = 0.98
As expected, we have high \(W\), and high \(\tt{p-value}\;(0.98)\) above the level of significance \((\alpha = 0.05)\). Hence, we fail to reject \(H_0\) that the sample is from a normal distribution.
The sample of interest here is a sample of \(100\) observations from the exponential distribution with \(\tt{rate = 1/4}\) or \(\tt{mean = 4}\).
With level of significance \(\alpha = 0.05\), test if the sample is from a normal distribution.
Shapiro-Wilk normality test
data: sw_sample2
W = 0.76107, p-value = 1.856e-11
As expected, we have low \(W\), and low \(\tt{p-value}\;(1.856e-11)\) below the level of significance \((\alpha = 0.05)\). Hence, we reject \(H_0\) that the sample is from a normal distribution.
The one-sample normal distribution test null and alternate hypotheses are:
\(H_0\): The sample is from the normal distribution with \(\tt{mean}\) and \(\tt{sd}\) specified.
\(H_1\): The sample is NOT from the normal distribution.
The test-statistic is \(D\), the higher the value of \(D\), the more different the distribution is from a normal distribution, leading to a smaller p-value.
The sample of interest here is a sample of \(60\) observations from the normal distribution with \(\tt{mean = 8}\) and \(\tt{sd = 3}\).
With level of significance \(\alpha = 0.05\), test if the sample is from the normal distribution with \(\tt{mean = 8}\) and \(\tt{sd = 3}\).
Exact one-sample Kolmogorov-Smirnov test
data: ks_sample
D = 0.083034, p-value = 0.7713
alternative hypothesis: two-sided
As expected, we have low \(D_n\), and high \(\tt{p-value}\;(0.7713)\) above the level of significance \((\alpha = 0.05)\). Hence, we fail to reject \(H_0\) that the sample is from \(X \sim N(8, 3)\).
With level of significance \(\alpha = 0.05\), test if the sample is from the normal distribution with \(\tt{mean = 5}\) and \(\tt{sd = 2}\).
Exact one-sample Kolmogorov-Smirnov test
data: ks_sample
D = 0.53547, p-value = 6.661e-16
alternative hypothesis: two-sided
As expected, we have high \(D_n\), and low \(\tt{p-value}\;(6.661e-16)\) below the level of significance \((\alpha = 0.05)\). Hence, we reject \(H_0\) that the sample is from \(X \sim N(5, 2)\).
The feedback form is a Google form but it does not collect any personal information.
Please click on the link below to go to the Google form.
Thank You!
Go to Feedback Form
Copyright © 2020 - 2024. All Rights Reserved by Stats Codes