Here, we discuss the binomial family GLM in R with interpretations, and link functions including, logit, probit, cauchit, log, and cloglog.

The binomial family generalized linear model in R can be performed with the glm() function from the base "stats" package.

The binomial family GLM can be used to study the relationship, if they exist, between a Bernoulli or binomial distributed dependent variable \((y)\), and a set of independent variables \((X = x_1, x_2, \ldots, x_m)\).

The binomial family GLM framework is based on the theoretical assumption that:

\[y\sim Bernoulli(p),\]

with \(p = E(y|X) = E(y|x_1, x_2, \ldots, x_m)\),

\[\begin{align} g(p) & = g(E(y|X)) \\ & = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_m x_m. \end{align}\]

The binomial family GLM then estimates the true coefficients, \(\beta_1, \beta_2, \ldots, \beta_m\), as \(\widehat \beta_1, \widehat \beta_2, \ldots, \widehat \beta_m\), and the true intercept, \(\beta_0\), as \(\widehat \beta_0\).
Then for any \(x_1, x_2, \ldots, x_m\) values, these are used to predict or estimate the true \(p\), as \(\widehat p\), with the equation below:

\[\widehat p = g^{-1}\left( \widehat \beta_0 + \widehat \beta_1 x_1 + \widehat \beta_2 x_2 + \cdots + \widehat \beta_m x_m \right).\]

See also logistic regression.

Sample Steps to Run a Binomial Family Generalized Linear Model:

# Create the data samples for the binomial family GLM
# Values are matched based on matching position in each sample

y = c(1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1)
x1 = c(4.19, 5.59, 3.50, 2.95, 4.93, 5.60,
       4.57, 4.94, 5.43, 5.14, 3.97, 5.30)
x2 = c(2.15, 2.67, 2.52, 2.79, 2.30, 2.97,
       3.13, 3.34, 2.26, 3.14, 2.89, 2.52)
bf_data = data.frame(y, x1, x2)
bf_data

# Run the binomial family GLM

model = glm(y ~ x1 + x2, family = binomial(link = "logit"))
summary(model)

#or

model = glm(y ~ x1 + x2, family = binomial(link = "logit"),
            data = bf_data)
summary(model)
   y   x1   x2
1  1 4.19 2.15
2  1 5.59 2.67
3  0 3.50 2.52
4  0 2.95 2.79
5  1 4.93 2.30
6  0 5.60 2.97
7  0 4.57 3.13
8  0 4.94 3.34
9  1 5.43 2.26
10 1 5.14 3.14
11 0 3.97 2.89
12 1 5.30 2.52

Call:
glm(formula = y ~ x1 + x2, family = binomial(link = "logit"))

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)    4.026      8.866   0.454   0.6498  
x1             2.479      1.497   1.656   0.0977 .
x2            -5.725      3.192  -1.793   0.0729 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 16.6355  on 11  degrees of freedom
Residual deviance:  6.7805  on  9  degrees of freedom
AIC: 12.78

Number of Fisher Scoring iterations: 5

Sample Interpretation of a Binomial Family GLM:

The link function used in the example above is the "logit" link, hence the parameter function is \(p = \frac{\exp(X\beta)}{1 + \exp(X\beta)}\).

For any \(x_1\) and \(x_2\), the estimated \(p\) or probability of success is: \[\widehat p = \frac{\exp(4.026 + 2.479x_1 - 5.725x_2)}{1 + \exp(4.026 + 2.479x_1 - 5.725x_2)}.\]

Table of Some Binomial Family Generalized Linear Model Arguments in R
Argument Usage
y ~ x1 + x2 +…+ xm y is the dependent sample, and x1, x2, …, xm are the independent samples
y ~ ., data y is the dependent sample, and "." means all other variables are in the model
family = binomial() Sets to binomial family, the link can be set in (), default is "logit"
link The link function between the dependent variable and the independent variables
data The dataframe object that contains the dependent and independent variables
start The guess start values for the coefficients to be estimated in the model
weights For binomial variable, when the proportion of success is the dependent variable:
vector of the number of cases for each proportion of success

Copyright © 2020 - 2024. All Rights Reserved by Stats Codes