One-Line Summary: Extending linear regression to non-normal responses via link functions -- unifying logistic, Poisson, and other regression types.

Prerequisites: Linear regression (OLS, assumptions), probability distributions (Bernoulli, Poisson, exponential family), maximum likelihood estimation, calculus (chain rule, Newton's method).

What Is a Generalized Linear Model?

Ordinary linear regression assumes that the response variable is continuous and normally distributed around its mean. But what if you are predicting whether a customer will churn (binary outcome), how many accidents occur at an intersection per year (count data), or how long until a machine fails (positive continuous)? Forcing these responses into a standard linear regression violates fundamental assumptions: binary data is not Gaussian, counts cannot be negative, and durations are not symmetric.

Generalized linear models (GLMs) extend linear regression to handle all of these situations within a single unified framework. The key idea is elegant: instead of modeling the response mean directly as a linear function of predictors, GLMs model a transformation of the mean (the link function) as linear, while allowing the response to follow any distribution from the exponential family.

Think of it this way: linear regression draws a straight line through the data. GLMs draw a straight line through a transformed version of the data, then invert the transformation to produce predictions on the original scale.

How It Works

The Three Components of a GLM

Every GLM is specified by three components:

  1. Random Component: The response follows a distribution from the exponential family:

where is the natural (canonical) parameter, is a dispersion parameter, and , , are known functions defining the specific distribution. The mean and variance are:

  1. Systematic Component: A linear predictor formed from the covariates:

  1. Link Function: A monotonic, differentiable function that connects the mean to the linear predictor:

Exponential Family Distributions

The exponential family includes many common distributions:

DistributionSupportNatural Parameter Canonical Link
NormalIdentity:
BernoulliLogit:
PoissonLog:
GammaInverse:

The canonical link function sets , linking the linear predictor directly to the natural parameter. Using the canonical link simplifies estimation and yields desirable statistical properties, but non-canonical links can also be used.

Logistic Regression as a GLM

Binary classification via logistic regression is a GLM with:

  • Random component:
  • Link function: logit,
  • Model:

The inverse link gives the predicted probability:

Coefficients are interpreted on the log-odds scale: a unit increase in multiplies the odds by .

Poisson Regression as a GLM

For count data (e.g., number of insurance claims):

  • Random component:
  • Link function: log,
  • Model:

The inverse link ensures predictions are non-negative: . Coefficients are interpreted as multiplicative: a unit increase in multiplies the expected count by .

Estimation via IRLS

GLMs are fit by maximum likelihood. The log-likelihood for the exponential family is:

There is generally no closed-form solution (except for the normal distribution, which recovers OLS). Instead, we use Iteratively Reweighted Least Squares (IRLS):

  1. Initialize (e.g., from OLS on the link-transformed responses).
  2. At each iteration , compute working responses and weights :

  1. Update:
  2. Repeat until convergence.

Each iteration is a weighted least squares problem, hence the name. IRLS is a form of Fisher scoring, which is equivalent to Newton-Raphson when the canonical link is used.

Overdispersion

The Poisson distribution assumes (mean equals variance). In practice, count data often exhibits overdispersion: . Ignoring overdispersion leads to underestimated standard errors and spuriously significant coefficients.

Remedies include:

  • Quasi-Poisson: Introduces a dispersion parameter so that , estimated from the data.
  • Negative binomial regression: Models count data with a variance function , accommodating extra-Poisson variation.

Deviance as Goodness of Fit

The deviance generalizes the residual sum of squares to GLMs:

where the saturated model has one parameter per observation (fitting the data perfectly). The deviance measures how far the fitted model is from this perfect fit. For the normal distribution with identity link, the deviance reduces to the RSS.

The null deviance (intercept-only model) minus the residual deviance (fitted model) measures the explained deviance, analogous to . For nested models, the difference in deviances follows approximately a distribution, enabling likelihood ratio tests.

Why It Matters

GLMs unify an enormous range of statistical models under a single theoretical and computational framework. Instead of learning separate methods for binary outcomes, counts, and continuous data, a practitioner learns one framework and selects the appropriate distribution and link function for the problem at hand. This is both conceptually elegant and practically powerful. GLMs are the backbone of statistical modeling in epidemiology, insurance, ecology, and many other fields where response variables are not Gaussian.

Key Technical Details

  • GLMs estimate parameters (including intercept) via maximum likelihood, not OLS. Standard errors come from the observed or expected Fisher information matrix.
  • The canonical link guarantees that the sufficient statistic for is , and the log-likelihood is concave, ensuring a unique maximum.
  • Residual types for GLMs include deviance residuals, Pearson residuals, and working residuals, each useful for different diagnostic purposes.
  • AIC and BIC can be used for model comparison across GLMs with the same response distribution.
  • Regularized GLMs (e.g., penalized logistic regression with or penalties) are standard in high-dimensional settings.

Common Misconceptions

  • "GLMs are nonlinear models." GLMs are linear in the link-transformed mean. The systematic component is linear; only the mapping from to is nonlinear.
  • "You need the canonical link." Non-canonical links are perfectly valid. For example, the probit link (inverse normal CDF) is a common alternative to logit for binary data and may be preferable when the latent variable interpretation is natural.
  • "R-squared works for GLMs." The classical is not well-defined for GLMs. Use deviance explained, pseudo- measures (McFadden's, Nagelkerke's), or information criteria instead.
  • "GLMs handle any response distribution." Only distributions in the exponential family are covered. Heavy-tailed distributions (Cauchy) or mixture distributions require other approaches.
  • "Logistic regression is unrelated to linear regression." Logistic regression is a GLM that shares the same linear predictor structure as standard linear regression, differing only in the choice of distribution and link function.

Connections to Other Concepts

  • linear-regression.md: Linear regression is a GLM with a normal distribution and identity link function -- the simplest special case.
  • ridge-and-lasso-regression.md: Regularization extends directly to GLMs, producing penalized logistic regression and penalized Poisson regression for high-dimensional problems.
  • regression-diagnostics.md: GLMs have analogous diagnostic tools (deviance residuals, leverage in the working model, Cook's distance for GLMs) for checking model adequacy.
  • polynomial-regression.md: Polynomial and interaction terms can be included in the linear predictor of any GLM.
  • maximum-likelihood-estimation.md: GLM fitting is a direct application of MLE, with IRLS as the optimization algorithm.
  • zero-shot-classification.md: Logistic regression (a GLM) is the foundational classifier and the bridge between regression and classification in supervised learning.

Further Reading

  • McCullagh and Nelder, "Generalized Linear Models" (1989) -- The definitive reference on GLM theory, estimation, and diagnostics.
  • Nelder and Wedderburn, "Generalized Linear Models" (1972) -- The original paper introducing the GLM framework and IRLS.
  • Agresti, "Foundations of Linear and Generalized Linear Models" (2015) -- A modern and accessible treatment bridging linear models and GLMs.
  • Hastie, Tibshirani, and Friedman, "The Elements of Statistical Learning" (2009) -- Chapter 4 covers logistic regression in the GLM context with a machine learning perspective.