Here, we discuss the generalized linear model (GLM) in R with interpretations, including, binomial, Gaussian, Poisson, and gamma families.

The generalized linear model in R can be performed with the glm() function from the base "stats" package.

The generalized linear model can be used to study the non-linear relationships, if they exist, between a dependent variable \((y)\), with a specified distribution, and a set of independent variables \((X = x_1, x_2, \ldots, x_p)\).

The generalized linear model framework is based on the theoretical assumption that:

\[\begin{align} g\left(E(y|X)\right) & = g(\mu) \\ & = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p. \end{align}\]

The generalized linear model estimates the true coefficients, \(\beta_1, \beta_2, \ldots, \beta_p\), as \(\widehat \beta_1, \widehat \beta_2, \ldots, \widehat \beta_p\), and the true intercept, \(\beta_0\), as \(\widehat \beta_0\).
Then for any \(x_1, x_2, \ldots, x_p\) values, these are used to predict or estimate the true \(\mu\), as \(\widehat \mu\), with the equation below:

\[\widehat \mu = g^{-1}\left( \widehat \beta_0 + \widehat \beta_1 x_1 + \widehat \beta_2 x_2 + \cdots + \widehat \beta_p x_p \right).\]

See also logistic regression and binomial family GLM.

Sample Steps to Run a Generalized Linear Model:

# Create the data samples for the GLM
# Values are matched based on matching position in each sample

x1 = c(4.35, 6.18, 5.13, 5.80, 4.00, 4.15,
       5.09, 6.27, 7.15, 3.14, 4.75, 6.01)
x2 = c(3.76, 2.86, 3.85, 2.42, 1.84, 1.55,
       3.25, 3.87, 3.64, 2.16, 3.38, 4.03)
y = c(0, 1, 0 ,1, 1, 1, 0, 0, 1, 0, 1, 0)
glm_data = data.frame(y, x1, x2)
glm_data
   y   x1   x2
1  0 4.35 3.76
2  1 6.18 2.86
3  0 5.13 3.85
4  1 5.80 2.42
5  1 4.00 1.84
6  1 4.15 1.55
7  0 5.09 3.25
8  0 6.27 3.87
9  1 7.15 3.64
10 0 3.14 2.16
11 1 4.75 3.38
12 0 6.01 4.03
# Run the generalized linear model with family, link and variables specified

model = glm(y ~ x1 + x2, family = binomial(link = "logit"),
            data = glm_data)
summary(model)

Call:
glm(formula = y ~ x1 + x2, family = binomial(link = "logit"), 
    data = glm_data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)    1.037      4.146   0.250   0.8026  
x1             2.191      1.306   1.678   0.0934 .
x2            -3.999      2.169  -1.844   0.0652 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 16.6355  on 11  degrees of freedom
Residual deviance:  7.6478  on  9  degrees of freedom
AIC: 13.648

Number of Fisher Scoring iterations: 6

Sample Interpretation of a Generalized Linear Model:

The example above is of the binomial family, hence the parameter of interest is \(p\) (see below). It also uses the "logit" link, hence the parameter function is \(p = \frac{\exp(X\beta)}{1 + \exp(X\beta)}\).

These imply for any \(x_1\) and \(x_2\), the estimated \(p\) or probability of success is: \[\widehat p = \frac{\exp(1.037 + 2.191x_1 - 3.999x_2)}{1 + \exp(1.037 + 2.191x_1 - 3.999x_2)}\]

Also, the lower the AIC, the better the model fit.

Table of Some GLM Families, Distribution & Link Functions in R
Family Links Distribution Dependent Variable
Binomial
binomial

Parameter: \(p\)
logit (Default),
probit, cauchit,
log, cloglog
Bernoulli \((p)\):
\(\{0, 1\}\)
1’s & 0’s

Or a factor with two levels:
‘Yes’ & ‘No’
logit (Default),
probit, cauchit,
log, cloglog
Binomial \((n, p)\):
\(\{0, 1, \ldots, n\}\)
Matrix of counts of ‘success’
and counts of ‘failure’ from cases
of ‘success’ or ‘failure’
Gaussian
gaussian

Parameter: \(\mu\)
identity (Default),
log, inverse
Normal \((\mu, \sigma)\):
\((-\infty, \infty)\)
Continuous variable with
normal/Gaussian distribution
Poisson
poisson

Parameter: \(\lambda\)
log (Default),
identity, sqrt
Poisson \((\lambda)\):
\(\{0, 1, \ldots, n\}\)
Counts of events in a fixed
space or time period
Gamma
Gamma

Parameter: \(\alpha/ \beta\)
inverse (Default),
identity, log
Gamma \((\alpha, \beta)\):
\((0, \infty)\)
Continuous variable with
gamma distribution
inverse (Default),
identity, log
Exponential \((\lambda)\):
\((0, \infty)\)
Gamma \((1, \beta)\) \(\equiv\) Exponential \((\beta)\)
Continuous variable with
exponential distribution
Inverse Gaussian
inverse.gaussian

Parameter: \(\mu\)
1/mu^2 (Default),
inverse, identity,
log
Inverse Gaussian \((\mu, \lambda)\):
\((0, \infty)\)
Continuous variable with
inverse Gaussian distribution


Table of GLM Link Functions in R
Link Link Function Parameter Function
\(X\beta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p\)
The parameter \(\mu\) or \(p\) is the expected value of \(y\) \((E(y|X))\)
Logit
logit
\(\log\left(\frac{p}{1-p}\right) = X\beta\) \(p = \frac{\exp(X\beta)}{1 + \exp(X\beta)}\)
Probit
probit
\(\Phi^{-1}(p) = X\beta\) \(p = \Phi(X\beta)\)
\(\Phi\) is the cumulative distribution function
of the standard normal distribution.
Cauchit
cauchit
\(\tan(\pi(p-\frac{1}{2})) = X\beta\) \(p = \frac{1}{\pi}\arctan\left(X\beta\right) + \frac{1}{2}\)
Similar to the probit, based on the cumulative distribution
function of the standard Cauchy distribution.
Complementary
Log-Log
cloglog
\(\log(-\log(1-p)) = X\beta\) \(p = 1-\exp(-\exp(X\beta))\)
Identity
identity
\(\mu = X\beta\) \(\mu = X\beta\)
Log
log
\(\log(\mu) = X\beta\) \(\mu = \exp(X\beta)\)
Inverse
inverse
\(\frac{1}{\mu} = X\beta\) \(\mu = \frac{1}{X\beta}\)
Square Root
sqrt
\(\sqrt \mu = X\beta\) \(\mu = (X\beta)^2\)
Inverse Square
1/mu^2
\(\frac{1}{\mu^2} = X\beta\) \(\mu = \frac{1}{\sqrt {X\beta}}\)


Table of Some Generalized Linear Model Arguments in R
Argument Usage
y ~ x1 + x2 +…+ xp y is the dependent sample, and x1, x2, …, xp are the independent samples
y ~ ., data y is the dependent sample, and "." means all other variables are in the model
family The GLM family based on the distribution of the dependent variable
link The link function between the dependent variable and the independent variables
data The dataframe object that contains the dependent and independent variables
start The guess start values for the coefficients to aid convergence

1 Examples: Binomial Family GLM in R

2 Examples: Poisson Family GLM in R

3 Examples: Gaussian Family GLM in R

4 Examples: Gamma Family GLM in R

5 Creating Generalized Linear Model Summary Object and Model Object

# Create data
x1 = rnorm(1000, 4, 2)
x2 = rnorm(1000, 3, 1)
xb = 1 + 2*x1 - 3*x2
p = exp(xb)/(1+exp(xb))
y = rbinom(1000, 1, p)

# Create generalized linear model summary and model objects
reg_summary = summary(glm(y ~ x1 + x2,  family = binomial()))
reg_model = glm(y ~ x1 + x2, family = binomial())
# Extract a component from GLM summary object
reg_summary$coefficients; reg_summary$coefficients[, 1]
             Estimate Std. Error    z value     Pr(>|z|)
(Intercept)  1.143466  0.4118878   2.776158 5.500548e-03
x1           1.819446  0.1275497  14.264599 3.636756e-46
x2          -2.771271  0.2084446 -13.295001 2.474716e-40
(Intercept)          x1          x2 
   1.143466    1.819446   -2.771271 
# Extract a component from GLM model object
reg_model$coefficients
(Intercept)          x1          x2 
   1.143466    1.819446   -2.771271 
Table of Some Generalized Linear Model Summary and Model Object Outputs in R
GLM Component Usage
reg_summary$coefficients The estimated intercept and beta values:
their standard error, z-value and p-value
reg_summary$aic The Akaike’s An Information Criterion value
reg_summary$null.deviance The null model’s deviance
reg_summary$deviance The model’s deviance
reg_summary$df.residual The model’s residual degrees of freedom
reg_summary$df.null The null model’s residual degrees of freedom
reg_model$coefficients The estimated intercept and beta values
reg_model$fitted.values The predicted y values
reg_model$linear.predictors The predicted linear model values
reg_model$residuals The linear model residuals
reg_model$aic The Akaike’s An Information Criterion value
reg_model$null.deviance The null model’s deviance
reg_model$deviance The model’s deviance
reg_model$df.residual The model’s residual degrees of freedom
reg_model$df.null The null model’s residual degrees of freedom
reg_model$model The model dataframe

Copyright © 2020 - 2024. All Rights Reserved by Stats Codes