Here, we discuss F distribution functions in R, plots, parameter setting, random sampling, density, cumulative distribution and quantiles.
The F distribution with parameters \(\tt{degrees\;of\;freedom\;1}=d_1\), and \(\tt{degrees\;of\;freedom\;2}=d_2\) has probability density function (pdf) formula as:
\[\begin{align} f(x)& =\frac{\Gamma \left(\frac{d_1 +d_2}{2} \right)} {\Gamma \left(\frac{d_1}{2} \right)\,\Gamma \left(\frac{d_2}{2} \right)} \left(\frac{d_1}{d_2}\right)^{d_1/2} x^{d_1/2 - 1} \left(1+\frac{d_1}{d_2} \, x \right)^{-(d_1+d_2)/2}\\ & = \frac{1}{\operatorname{B}\left(\frac{d_1}{2},\frac{d_2}{2}\right)} \left(\frac{d_1}{d_2}\right)^{d_1/2} x^{d_1/2 - 1} \left(1+\frac{d_1}{d_2} \, x \right)^{-(d_1+d_2)/2},\end{align}\]
for \(x \in (0, +\infty)\), if \(d_1 = 1\),
otherwise \(x \in [0, +\infty)\), where \(d_1>0\), and \(d_2>0\),
\(\Gamma\) is the \(\tt{gamma\;function}\), and \(\mathrm{B}\) is the \(\tt{beta\;function}\).
The mean is \(\frac{d_2}{d_2-2}\) for \(d_2 > 2\), and variance is \(\frac{2\,d_2^2\,(d_1+d_2-2)}{d_1 (d_2-2)^2 (d_2-4)}\) for \(d_2 > 4\).
See also probability distributions and plots and charts.
The table below shows the functions for F distributions in R.
Function | Usage |
rf(n, df1, df2, ncp) | Simulate a random sample with \(n\) observations |
df(x, df1, df2, ncp) | Calculate the probability density at the point \(x\) |
pf(q, df1, df2, ncp) | Calculate the cumulative distribution at the point \(q\) |
qf(p, df1, df2, ncp) | Calculate the quantile value associated with \(p\) |
The examples here are central F distributions, hence, the "ncp" argument is excluded in the examples below.
However, for non-central F distributions, you can
set the argument of the non-centrality parameter value to a non-zero
value as ncp = 0
is central. For example:
[1] 0.9775903
[1] 0.9775903
Below is a plot of the F distribution function with \(\tt{df1}=3\) and \(\tt{df2}=30\).
x = seq(0, 6, 1/1000); y = df(x, 3, 30)
plot(x, y, type = "l",
xlim = c(0, 6), ylim = c(0, max(y)),
main = "Probability Density Function of F Distribution (3, 30)",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
# Add legend
legend("topright", "df1 = 3, df2 = 30",
lwd = 2,
col = "blue",
bty = "n")
Below is a plot of multiple F distribution functions in one graph.
x1 = seq(0, 10, 1/1000); y1 = df(x1, 2, 50)
x2 = seq(0, 10, 1/1000); y2 = df(x2, 20, 20)
x3 = seq(0, 10, 1/1000); y3 = df(x3, 50, 2)
plot(x1, y1, type = "l",
xlim = c(0, 10), ylim = range(c(y1, y2, y3)),
main = "Probability Density Functions of F Distributions",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
points(x2, y2, type = "l", lwd = 2, col = "red")
points(x3, y3, type = "l", lwd = 2, col = "green")
# Add legend
legend("topright", c("df1 = 2, df2 = 50",
"df1 = 20, df2 = 20",
"df1 = 50, df2 = 2"),
lwd = c(2, 2, 2),
col = c("blue", "red", "green"),
bty = "n")
To set the parameters for the F distribution function, with \(\tt{df1} = 4\) and \(\tt{df2} = 80\).
For example, for qf()
, the following are the same:
# The order of 4 and 80 matters here as the parameter names are not used.
# The first number 4 is df1, and 80 is df2.
qf(0.5, 4, 80)
[1] 0.846329
[1] 0.846329
[1] 0.846329
Sample 900 observations from the F distribution with \(\tt{df1} = 2\) and \(\tt{df2} = 60\):
set.seed(12) # Line allows replication (use any number).
sample = rf(900, 2, 60)
hist(sample,
main = "Histogram of 900 Observations from F Distribution
with Df1 = 2 and Df2 = 60",
xlab = "x",
col = "deepskyblue", border = "white")
Calculate the density at \(x = 1.6\), in the F distribution with \(\tt{df1} = 6\) and \(\tt{df2} = 45\):
[1] 0.2837588
x = seq(0, 5, 1/1000); y = df(x, 6, 45)
plot(x, y, type = "l",
xlim = c(0, 5), ylim = c(0, max(y)),
main = "Probability Density Function of F Distribution
with Df1 = 6 and Df2 = 45",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
# Add lines
segments(1.6, -1, 1.6, 0.2837588)
segments(-1, 0.2837588, 1.6, 0.2837588)
Calculate the cumulative distribution at \(x = 2.1\), in the F distribution with \(\tt{df1} = 7\) and \(\tt{df2} = 40\). That is, \(P(X \le 2.1)\):
[1] 0.93415
x = seq(0, 4, 1/1000); y = pf(x, 7, 40)
plot(x, y, type = "l",
xlim = c(0, 4), ylim = c(0,1),
main = "Cumulative Distribution Function of F Distribution
with Df1 = 7 and Df2 = 40",
xlab = "x", ylab = "Cumulative Distribution",
lwd = 2, col = "blue")
# Add lines
segments(2.1, -1, 2.1, 0.93415)
segments(-1, 0.93415, 2.1, 0.93415)
x = seq(0, 4, 1/1000); y = df(x, 7, 40)
plot(x, y, type = "l",
xlim = c(0, 4), ylim = c(0, max(y)),
main = "Probability Density Function of F Distribution
with Df1 = 7 and Df2 = 40",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
# Add shaded region and legend
point = 2.1
polygon(x = c(x[x <= point], point),
y = c(y[x <= point], 0),
col = "limegreen")
legend("topright", c("Area = 0.93415"),
fill = c("limegreen"),
inset = 0.01)
For upper tail, at \(x = 2.1\), that is, \(P(X \ge 2.1) = 1 - P(X \le 2.1)\), set the "lower.tail" argument:
[1] 0.06584997
x = seq(0, 4, 1/1000); y = df(x, 7, 40)
plot(x, y, type = "l",
xlim = c(0, 4), ylim = c(0, max(y)),
main = "Shaded Upper Region: Probability Density Function of
F Distribution with Df1 = 7 and Df2 = 40",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
# Add shaded region and legend
point = 2.1
polygon(x = c(point, x[x >= point]),
y = c(0, y[x >= point]),
col = "limegreen")
legend("topright", c("Area = 0.06584997"),
fill = c("limegreen"),
inset = 0.01)
Derive the quantile for \(p = 0.85\), in the F distribution with \(\tt{df1} = 6\) and \(\tt{df2} = 35\). That is, \(x\) such that, \(P(X \le x)=0.85\):
[1] 1.700715
x = seq(0, 4, 1/1000); y = pf(x, 6, 35)
plot(x, y, type = "l",
xlim = c(0, 4), ylim = c(0,1),
main = "Cumulative Distribution Function of
F Distribution with Df1 = 6 and Df2 = 35",
xlab = "x", ylab = "Cumulative Distribution",
lwd = 2, col = "blue")
# Add lines
segments(1.700715, -1, 1.700715, 0.85)
segments(-1, 0.85, 1.700715, 0.85)
x = seq(0, 4, 1/1000); y = df(x, 6, 35)
plot(x, y, type = "l",
xlim = c(0, 4), ylim = c(0, max(y)),
main = "Probability Density Function of F Distribution
with Df1 = 6 and Df2 = 35",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
# Add shaded region and legend
point = 1.700715
polygon(x = c(x[x <= point], point),
y = c(y[x <= point], 0),
col = "limegreen")
legend("topright", c("Area = 0.85"),
fill = c("limegreen"),
inset = 0.01)
For upper tail, for \(p = 0.15\), that is, \(x\) such that, \(P(X \ge x)=0.15\):
[1] 1.700715
x = seq(0, 4, 1/1000); y = df(x, 6, 35)
plot(x, y, type = "l",
xlim = c(0, 4), ylim = c(0, max(y)),
main = "Shaded Upper Region: Probability Density Function of
F Distribution with Df1 = 6 and Df2 = 35",
xlab = "x", ylab = "Density",
lwd = 2, col = "blue")
# Add shaded region and legend
point = 1.700715
polygon(x = c(point, x[x >= point]),
y = c(0, y[x >= point]),
col = "limegreen")
legend("topright", c("Area = 0.15"),
fill = c("limegreen"),
inset = 0.01)
The feedback form is a Google form but it does not collect any personal information.
Please click on the link below to go to the Google form.
Thank You!
Go to Feedback Form
Copyright © 2020 - 2024. All Rights Reserved by Stats Codes