Regression Simple Linear Regression Multiple Regression Logistic Regression Zero-Intercept Regression With Categorical Independent Variables Generalized Linear Models (GLM) Binomial Family GLM Stepwise Regression Correlation & Covariance

1 Examples: Binomial Family GLM in R
- 1.1 Bernoulli Variable with Logit Link
- 1.2 Binomial Variable with Probit Link
2 Examples: Poisson Family GLM in R
- 2.1 Log Link
- 2.2 Square Root Link
3 Examples: Gaussian Family GLM in R
4 Examples: Gamma Family GLM in R
- 4.1 Gamma Variable with Log Link
- 4.2 Exponential Variable with Log Link
5 Creating Generalized Linear Model Summary Object and Model Object

Here, we discuss the generalized linear model (GLM) in R with interpretations, including, binomial, Gaussian, Poisson, and gamma families.

The generalized linear model in R can be performed with the glm() function from the base "stats" package.

The generalized linear model can be used to study the non-linear relationships, if they exist, between a dependent variable $(y)$, with a specified distribution, and a set of independent variables $(X = x_1, x_2, \ldots, x_p)$.

The generalized linear model framework is based on the theoretical assumption that:

\[\begin{align} g\left(E(y|X)\right) & = g(\mu) \\ & = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p. \end{align}\]

The generalized linear model estimates the true coefficients, $\beta_1, \beta_2, \ldots, \beta_p$, as $\widehat \beta_1, \widehat \beta_2, \ldots, \widehat \beta_p$, and the true intercept, $\beta_0$, as $\widehat \beta_0$.
Then for any $x_1, x_2, \ldots, x_p$ values, these are used to predict or estimate the true $\mu$, as $\widehat \mu$, with the equation below:

\[\widehat \mu = g^{-1}\left( \widehat \beta_0 + \widehat \beta_1 x_1 + \widehat \beta_2 x_2 + \cdots + \widehat \beta_p x_p \right).\]

Sample Steps to Run a Generalized Linear Model:

# Create the data samples for the GLM
# Values are matched based on matching position in each sample

x1 = c(4.35, 6.18, 5.13, 5.80, 4.00, 4.15,
       5.09, 6.27, 7.15, 3.14, 4.75, 6.01)
x2 = c(3.76, 2.86, 3.85, 2.42, 1.84, 1.55,
       3.25, 3.87, 3.64, 2.16, 3.38, 4.03)
y = c(0, 1, 0 ,1, 1, 1, 0, 0, 1, 0, 1, 0)
glm_data = data.frame(y, x1, x2)
glm_data

   y   x1   x2
1  0 4.35 3.76
2  1 6.18 2.86
3  0 5.13 3.85
4  1 5.80 2.42
5  1 4.00 1.84
6  1 4.15 1.55
7  0 5.09 3.25
8  0 6.27 3.87
9  1 7.15 3.64
10 0 3.14 2.16
11 1 4.75 3.38
12 0 6.01 4.03

# Run the generalized linear model with family, link and variables specified

model = glm(y ~ x1 + x2, family = binomial(link = "logit"),
            data = glm_data)
summary(model)


Call:
glm(formula = y ~ x1 + x2, family = binomial(link = "logit"), 
    data = glm_data)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)    1.037      4.146   0.250   0.8026  
x1             2.191      1.306   1.678   0.0934 .
x2            -3.999      2.169  -1.844   0.0652 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 16.6355  on 11  degrees of freedom
Residual deviance:  7.6478  on  9  degrees of freedom
AIC: 13.648

Number of Fisher Scoring iterations: 6

Sample Interpretation of a Generalized Linear Model:

The example above is of the binomial family, hence the parameter of interest is $p$ (see below). It also uses the "logit" link, hence the parameter function is $p = \frac{\exp(X\beta)}{1 + \exp(X\beta)}$.

These imply for any $x_1$ and $x_2$, the estimated $p$ or probability of success is: \[\widehat p = \frac{\exp(1.037 + 2.191x_1 - 3.999x_2)}{1 + \exp(1.037 + 2.191x_1 - 3.999x_2)}\]

Also, the lower the AIC, the better the model fit.

Table of Some GLM Families, Distribution & Link Functions in R
Family	Links	Distribution	Dependent Variable
Binomial `binomial` Parameter: $p$	`logit` (Default), `probit`, `cauchit`, `log`, `cloglog`	Bernoulli $(p)$: $\{0, 1\}$	1’s & 0’s Or a factor with two levels: ‘Yes’ & ‘No’
Binomial `binomial` Parameter: $p$	`logit` (Default), `probit`, `cauchit`, `log`, `cloglog`	Binomial $(n, p)$: $\{0, 1, \ldots, n\}$	Matrix of counts of ‘success’ and counts of ‘failure’ from cases of ‘success’ or ‘failure’
Gaussian `gaussian` Parameter: $\mu$	`identity` (Default), `log`, `inverse`	Normal $(\mu, \sigma)$: $(-\infty, \infty)$	Continuous variable with normal/Gaussian distribution
Poisson `poisson` Parameter: $\lambda$	`log` (Default), `identity`, `sqrt`	Poisson $(\lambda)$: $\{0, 1, \ldots, n\}$	Counts of events in a fixed space or time period
Gamma `Gamma` Parameter: $\alpha/ \beta$	`inverse` (Default), `identity`, `log`	Gamma $(\alpha, \beta)$: $(0, \infty)$	Continuous variable with gamma distribution
Gamma `Gamma` Parameter: $\alpha/ \beta$	`inverse` (Default), `identity`, `log`	Exponential $(\lambda)$: $(0, \infty)$ Gamma $(1, \beta)$ $\equiv$ Exponential $(\beta)$	Continuous variable with exponential distribution
Inverse Gaussian `inverse.gaussian` Parameter: $\mu$	`1/mu^2` (Default), `inverse`, `identity`, `log`	Inverse Gaussian $(\mu, \lambda)$: $(0, \infty)$	Continuous variable with inverse Gaussian distribution

Table of GLM Link Functions in R
Link	Link Function	Parameter Function
Link	$X\beta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p$ The parameter $\mu$ or $p$ is the expected value of $y$ $(E(y\|X))$
Logit `logit`	$\log\left(\frac{p}{1-p}\right) = X\beta$	$p = \frac{\exp(X\beta)}{1 + \exp(X\beta)}$
Probit `probit`	$\Phi^{-1}(p) = X\beta$	$p = \Phi(X\beta)$
Probit `probit`	$\Phi$ is the cumulative distribution function of the standard normal distribution.
Cauchit `cauchit`	$\tan(\pi(p-\frac{1}{2})) = X\beta$	$p = \frac{1}{\pi}\arctan\left(X\beta\right) + \frac{1}{2}$
Cauchit `cauchit`	Similar to the `probit`, based on the cumulative distribution function of the standard Cauchy distribution.
Complementary Log-Log `cloglog`	$\log(-\log(1-p)) = X\beta$	$p = 1-\exp(-\exp(X\beta))$
Identity `identity`	$\mu = X\beta$	$\mu = X\beta$
Log `log`	$\log(\mu) = X\beta$	$\mu = \exp(X\beta)$
Inverse `inverse`	$\frac{1}{\mu} = X\beta$	$\mu = \frac{1}{X\beta}$
Square Root `sqrt`	$\sqrt \mu = X\beta$	$\mu = (X\beta)^2$
Inverse Square `1/mu^2`	$\frac{1}{\mu^2} = X\beta$	$\mu = \frac{1}{\sqrt {X\beta}}$

Table of Some Generalized Linear Model Arguments in R
Argument	Usage
y ~ x1 + x2 +…+ xp	y is the dependent sample, and x1, x2, …, xp are the independent samples
y ~ ., data	y is the dependent sample, and "." means all other variables are in the model
family	The GLM family based on the distribution of the dependent variable
link	The link function between the dependent variable and the independent variables
data	The dataframe object that contains the dependent and independent variables
start	The guess start values for the coefficients to aid convergence

1 Examples: Binomial Family GLM in R

1.1 Bernoulli Variable with Logit Link

set.seed(123)
n = 1000

# Simulate linear combination (mean 0)
x1 = rnorm(n, mean = 4, sd = 0.5)
x2 = rnorm(n, mean = 3, sd = 0.25)
xb = 3 + 3*x1 - 5*x2

# Simulate logit Bernoulli 1's and 0's
p = exp(xb)/(1+exp(xb))
y_bernoulli = rbinom(n, size = 1, prob = p)

# Run GLM for binomial family with logit
model = glm(y_bernoulli ~ x1 + x2,
            family = binomial(link = "logit"))
summary(model)


Call:
glm(formula = y_bernoulli ~ x1 + x2, family = binomial(link = "logit"))

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.7353     1.1645   2.349   0.0188 *  
x1            3.3301     0.2336  14.253   <2e-16 ***
x2           -5.3653     0.4286 -12.519   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1385.51  on 999  degrees of freedom
Residual deviance:  911.91  on 997  degrees of freedom
AIC: 917.91

Number of Fisher Scoring iterations: 5

Model to estimate $\widehat p$: \[\widehat p = \frac{\exp(2.7353 + 3.3301 x_1 - 5.3653 x_2)}{1 + \exp(2.7353 + 3.3301 x_1 - 5.3653 x_2)}.\]

1.2 Binomial Variable with Probit Link

set.seed(100)
n = 1000

# Simulate linear combination (mean 0)
x1 = rnorm(n, mean = 4, sd = 0.025)
x2 = rnorm(n, mean = 3, sd = 0.035)
xb = 3 + 3*x1 - 5*x2

# Simulate probit binomial variables
p = pnorm(xb)
success = rbinom(n, size = 20, prob = p)
y_binomial = cbind(success, 20 - success)

# Run GLM for binomial family with probit
model = glm(y_binomial ~ x1 + x2,
            family = binomial(link = "probit"))
summary(model)


Call:
glm(formula = y_binomial ~ x1 + x2, family = binomial(link = "probit"))

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   2.7426     1.5814   1.734   0.0829 .  
x1            3.3704     0.3479   9.689   <2e-16 ***
x2           -5.4050     0.2635 -20.512   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1543.0  on 999  degrees of freedom
Residual deviance: 1030.3  on 997  degrees of freedom
AIC: 4427.6

Number of Fisher Scoring iterations: 3

Model to estimate $\widehat p$: \[\widehat p = \Phi(2.7426 + 3.3704 x_1 - 5.4050 x_2).\]

2 Examples: Poisson Family GLM in R

2.1 Log Link

set.seed(123)
n = 1000

# Simulate linear combination (must be positive)
x1 = rnorm(n, mean = 0.5, sd = 0.05)
x2 = rnorm(n, mean = 0.3, sd = 0.04)
xb = 2 + 3*x1 - 4*x2

# Simulate Poisson variables with exponent xb
lmb = exp(xb)
y_poisson = rpois(n, lambda = lmb)

# Run GLM for Poisson family with log link
model = glm(y_poisson ~ x1 + x2,
            family = poisson(link = "log"))
summary(model)


Call:
glm(formula = y_poisson ~ x1 + x2, family = poisson(link = "log"))

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.8699     0.1220   15.33   <2e-16 ***
x1            3.1603     0.2026   15.60   <2e-16 ***
x2           -3.9220     0.2495  -15.72   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1427.1  on 999  degrees of freedom
Residual deviance:  975.9  on 997  degrees of freedom
AIC: 5051.8

Number of Fisher Scoring iterations: 4

Model to estimate $\widehat \lambda$: \[\widehat \lambda = \exp(1.8699 + 3.1603 x_1 - 3.9220 x_2).\]

2.2 Square Root Link

set.seed(123)
n = 1000

# Simulate linear combination (must be positive)
x1 = rnorm(n, mean = 2, sd = 0.1)
x2 = rnorm(n, mean = 3, sd = 0.1)
xb = 3 + 4*x1 - 3*x2

# Simulate Poisson variables with square xb
lmb = xb^2
y_poisson = rpois(n, lambda = lmb)

# Run GLM for Poisson family with square root link
model = glm(y_poisson ~ x1 + x2,
            family = poisson(link = "sqrt"))
summary(model)


Call:
glm(formula = y_poisson ~ x1 + x2, family = poisson(link = "sqrt"))

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   3.2188     0.5477   5.877 4.18e-09 ***
x1            3.8461     0.1601  24.021  < 2e-16 ***
x2           -2.9787     0.1573 -18.940  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1874.2  on 999  degrees of freedom
Residual deviance: 1033.4  on 997  degrees of freedom
AIC: 4036.7

Number of Fisher Scoring iterations: 4

Model to estimate $\widehat \lambda$: \[\widehat \lambda = (3.2188 + 3.8461 x_1 - 2.9787 x_2)^2.\]

3 Examples: Gaussian Family GLM in R

3.1 Inverse Link

set.seed(123)
n = 1000

# Simulate linear combination
# For inverse link, with max or min close to 0
x1 = rnorm(n, mean = 5, sd = 0.25)
x2 = rnorm(n, mean = 4, sd = 0.25)
xb = 3 + 2*x1 - 4*x2

# Simulate normal variables with inverse xb
mu = 1/xb
y_gaussian = rnorm(n, mu, 0.5)

# Run GLM for Gaussian family with inverse
model = glm(y_gaussian ~ x1 + x2,
            family = gaussian(link = "inverse"))
summary(model)


Call:
glm(formula = y_gaussian ~ x1 + x2, family = gaussian(link = "inverse"))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.91729    0.14210   20.53   <2e-16 ***
x1           1.89521    0.06733   28.15   <2e-16 ***
x2          -3.81398    0.12456  -30.62   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.2393183)

    Null deviance: 593.81  on 999  degrees of freedom
Residual deviance: 238.61  on 997  degrees of freedom
AIC: 1413

Number of Fisher Scoring iterations: 7

Model to estimate $\widehat \mu$: \[\widehat \mu = \frac{1}{2.91729 + 1.89521 x_1 - 3.81398 x_2}.\]

3.2 Log Link

set.seed(123)
n = 1000

# Simulate linear combination (positive values)
x1 = rnorm(n, mean = 5, sd = 0.25)
x2 = rnorm(n, mean = 4, sd = 0.25)
xb = 9 + 2*x1 - 4*x2

# Simulate normal variables with exponent xb
mu = exp(xb)
y = rnorm(n, mu, 0.5)

# Run GLM for Gaussian family with log link
model = glm(y ~ x1 + x2,
            family = gaussian(link = "log"))
summary(model)


Call:
glm(formula = y ~ x1 + x2, family = gaussian(link = "log"))

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.003332   0.006822    1320   <2e-16 ***
x1           1.999409   0.001186    1686   <2e-16 ***
x2          -4.000175   0.001396   -2865   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.2393446)

    Null deviance: 2275692.30  on 999  degrees of freedom
Residual deviance:     238.63  on 997  degrees of freedom
AIC: 1413

Number of Fisher Scoring iterations: 3

Model to estimate $\widehat \mu$: \[\widehat \mu = \exp(9.003332 + 1.999409 x_1 - 4.000175 x_2).\]

# Compare to log transformation on y
summary(lm(log(y) ~ x1 + x2))


Call:
lm(formula = log(y) ~ x1 + x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.86680 -0.01391 -0.00124  0.01711  0.73651 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.959441   0.061884   144.8   <2e-16 ***
x1           2.017755   0.010145   198.9   <2e-16 ***
x2          -4.012752   0.009964  -402.7   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0792 on 997 degrees of freedom
Multiple R-squared:  0.9948,    Adjusted R-squared:  0.9948 
F-statistic: 9.465e+04 on 2 and 997 DF,  p-value: < 2.2e-16

NOTE: As expected, the results are similar. However, for the same data, on the log(y) scale (OLS model dependent variable), the OLS should have better fit, but on the level scale (GLM model dependent variable), the GLM should have better fit.

3.3 Identity Link

set.seed(123)
n = 1000

# Simulate linear combination
x1 = rnorm(n, mean = 6, sd = 2)
x2 = rnorm(n, mean = 5, sd = 2)
xb = 5 + 4*x1 - 6*x2

# Simulate normal variable with identity
mu = xb
y = rnorm(n, mu, 0.5)

# Run GLM for Gaussian family with identity link
model = glm(y ~ x1 + x2,
            family = gaussian(link = "identity"))
summary(model)


Call:
glm(formula = y ~ x1 + x2, family = gaussian(link = "identity"))

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.987387   0.060707   82.16   <2e-16 ***
x1           3.994627   0.007836  509.79   <2e-16 ***
x2          -5.993123   0.007696 -778.70   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.2394945)

    Null deviance: 192697.76  on 999  degrees of freedom
Residual deviance:    238.78  on 997  degrees of freedom
AIC: 1413.6

Number of Fisher Scoring iterations: 2

Model to estimate $\widehat \mu$: \[\widehat \mu = 4.987387 + 3.994627 x_1 - 5.993123 x_2.\]

# Compare to typical OLS regression with y
summary(lm(y ~ x1 + x2))


Call:
lm(formula = y ~ x1 + x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.41801 -0.31386 -0.01849  0.32692  1.68933 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.987387   0.060707   82.16   <2e-16 ***
x1           3.994627   0.007836  509.79   <2e-16 ***
x2          -5.993123   0.007696 -778.70   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4894 on 997 degrees of freedom
Multiple R-squared:  0.9988,    Adjusted R-squared:  0.9988 
F-statistic: 4.018e+05 on 2 and 997 DF,  p-value: < 2.2e-16

NOTE: As expected, the OLS, and the GLM with the "identity" link produced the same result.

4 Examples: Gamma Family GLM in R

4.1 Gamma Variable with Log Link

set.seed(123)
n = 1000

# Simulate linear combination
x1 = rnorm(n, mean = 10, sd = 1)
x2 = rnorm(n, mean = 8, sd = 1)
xb = 3 + 4*x1 - 5*x2

# Simulate gamma variables
average = exp(xb)
# average = shape/rate or rate = shape/average
y_gamma = rgamma(n, shape = 1, rate = 1/average)

# Run GLM for gamma family with log link
model = glm(y_gamma ~ x1 + x2,
            family = Gamma(link = "log"))
summary(model)


Call:
glm(formula = y_gamma ~ x1 + x2, family = Gamma(link = "log"))

Coefficients:
            Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  3.91850    0.40803    9.603   <2e-16 ***
x1           3.94799    0.03331  118.507   <2e-16 ***
x2          -5.04674    0.03272 -154.235   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 1.082247)

    Null deviance: 24128.1  on 999  degrees of freedom
Residual deviance:  1112.9  on 997  degrees of freedom
AIC: 7767

Number of Fisher Scoring iterations: 6

Model to estimate $\text{mean}$: \[\text{mean} = \frac{\widehat \alpha}{\widehat \beta} = \exp \left( 3.91850 + 3.94799 x_1 - 5.04674 x_2 \right).\]

4.2 Exponential Variable with Log Link

set.seed(123)
n = 1000

# Simulate linear combination
x1 = rnorm(n, mean = 6, sd = 0.5)
x2 = rnorm(n, mean = 4, sd = 0.5)
xb = 4 + 4*x1 - 5*x2

# Simulate exponential variables
average = exp(xb)
# rgamma(n, shape = 1, rate = 1/average) is equivalent to rexp(n, rate = 1/average)
# average = 1/rate or rate = 1/average
y_exp = rexp(n, rate = 1/average)

# Run GLM for gamma family (exponential variable) with log link
model = glm(y_exp ~ x1 + x2,
            family = Gamma(link = "log"))
summary(model)


Call:
glm(formula = y_exp ~ x1 + x2, family = Gamma(link = "log"))

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.47013    0.44869   9.963   <2e-16 ***
x1           3.98958    0.06487  61.497   <2e-16 ***
x2          -5.09976    0.06372 -80.035   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Gamma family taken to be 1.026001)

    Null deviance: 9488.1  on 999  degrees of freedom
Residual deviance: 1152.6  on 997  degrees of freedom
AIC: 17886

Number of Fisher Scoring iterations: 6

Model to estimate $\text{mean}$: \[\text{mean} = \frac{1}{\widehat \lambda} = \exp \left(4.47013 + 3.98958 x_1 - 5.09976 x_2 \right).\]

5 Creating Generalized Linear Model Summary Object and Model Object

# Create data
x1 = rnorm(1000, 4, 2)
x2 = rnorm(1000, 3, 1)
xb = 1 + 2*x1 - 3*x2
p = exp(xb)/(1+exp(xb))
y = rbinom(1000, 1, p)

# Create generalized linear model summary and model objects
reg_summary = summary(glm(y ~ x1 + x2,  family = binomial()))
reg_model = glm(y ~ x1 + x2, family = binomial())

# Extract a component from GLM summary object
reg_summary$coefficients; reg_summary$coefficients[, 1]

             Estimate Std. Error    z value     Pr(>|z|)
(Intercept)  1.143466  0.4118878   2.776158 5.500548e-03
x1           1.819446  0.1275497  14.264599 3.636756e-46
x2          -2.771271  0.2084446 -13.295001 2.474716e-40

(Intercept)          x1          x2 
   1.143466    1.819446   -2.771271

# Extract a component from GLM model object
reg_model$coefficients

(Intercept)          x1          x2 
   1.143466    1.819446   -2.771271

Table of Some Generalized Linear Model Summary and Model Object Outputs in R
GLM Component	Usage
reg_summary$coefficients	The estimated intercept and beta values: their standard error, z-value and p-value
reg_summary$aic	The Akaike’s An Information Criterion value
reg_summary$null.deviance	The null model’s deviance
reg_summary$deviance	The model’s deviance
reg_summary$df.residual	The model’s residual degrees of freedom
reg_summary$df.null	The null model’s residual degrees of freedom
reg_model$coefficients	The estimated intercept and beta values
reg_model$fitted.values	The predicted y values
reg_model$linear.predictors	The predicted linear model values
reg_model$residuals	The linear model residuals
reg_model$aic	The Akaike’s An Information Criterion value
reg_model$null.deviance	The null model’s deviance
reg_model$deviance	The model’s deviance
reg_model$df.residual	The model’s residual degrees of freedom
reg_model$df.null	The null model’s residual degrees of freedom
reg_model$model	The model dataframe

Link	Link Function	Parameter Function
Link	\(X\beta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p\) The parameter \(\mu\) or \(p\) is the expected value of \(y\) \((E(y\|X))\)
Logit `logit`	\(\log\left(\frac{p}{1-p}\right) = X\beta\)	\(p = \frac{\exp(X\beta)}{1 + \exp(X\beta)}\)
Probit `probit`	\(\Phi^{-1}(p) = X\beta\)	\(p = \Phi(X\beta)\)
Probit `probit`	\(\Phi\) is the cumulative distribution function of the standard normal distribution.
Cauchit `cauchit`	\(\tan(\pi(p-\frac{1}{2})) = X\beta\)	\(p = \frac{1}{\pi}\arctan\left(X\beta\right) + \frac{1}{2}\)
Cauchit `cauchit`	Similar to the `probit`, based on the cumulative distribution function of the standard Cauchy distribution.
Complementary Log-Log `cloglog`	\(\log(-\log(1-p)) = X\beta\)	\(p = 1-\exp(-\exp(X\beta))\)
Identity `identity`	\(\mu = X\beta\)	\(\mu = X\beta\)
Log `log`	\(\log(\mu) = X\beta\)	\(\mu = \exp(X\beta)\)
Inverse `inverse`	\(\frac{1}{\mu} = X\beta\)	\(\mu = \frac{1}{X\beta}\)
Square Root `sqrt`	\(\sqrt \mu = X\beta\)	\(\mu = (X\beta)^2\)
Inverse Square `1/mu^2`	\(\frac{1}{\mu^2} = X\beta\)	\(\mu = \frac{1}{\sqrt {X\beta}}\)

Family	Links	Distribution	Dependent Variable
Binomial `binomial` Parameter: \(p\)	`logit` (Default), `probit`, `cauchit`, `log`, `cloglog`	Bernoulli \((p)\): \(\{0, 1\}\)	1’s & 0’s Or a factor with two levels: ‘Yes’ & ‘No’
Binomial `binomial` Parameter: \(p\)	`logit` (Default), `probit`, `cauchit`, `log`, `cloglog`	Binomial \((n, p)\): \(\{0, 1, \ldots, n\}\)	Matrix of counts of ‘success’ and counts of ‘failure’ from cases of ‘success’ or ‘failure’
Gaussian `gaussian` Parameter: \(\mu\)	`identity` (Default), `log`, `inverse`	Normal \((\mu, \sigma)\): \((-\infty, \infty)\)	Continuous variable with normal/Gaussian distribution
Poisson `poisson` Parameter: \(\lambda\)	`log` (Default), `identity`, `sqrt`	Poisson \((\lambda)\): \(\{0, 1, \ldots, n\}\)	Counts of events in a fixed space or time period
Gamma `Gamma` Parameter: \(\alpha/ \beta\)	`inverse` (Default), `identity`, `log`	Gamma \((\alpha, \beta)\): \((0, \infty)\)	Continuous variable with gamma distribution
Gamma `Gamma` Parameter: \(\alpha/ \beta\)	`inverse` (Default), `identity`, `log`	Exponential \((\lambda)\): \((0, \infty)\) Gamma \((1, \beta)\) \(\equiv\) Exponential \((\beta)\)	Continuous variable with exponential distribution
Inverse Gaussian `inverse.gaussian` Parameter: \(\mu\)	`1/mu^2` (Default), `inverse`, `identity`, `log`	Inverse Gaussian \((\mu, \lambda)\): \((0, \infty)\)	Continuous variable with inverse Gaussian distribution

Generalized Linear Model in R