15  Multiple Regression

15.1 Multiple Regression in Context

Multiple regression extends the simple linear regression of Chapter 13 to more than one predictor. A single numeric response is modelled as a linear combination of several predictors, and the coefficients are estimated by ordinary least squares (Gauss 1809). The regression idea itself goes back to Galton’s study of inherited stature (Galton 1886).

NoteTwo roles of a regression model

Descriptive, when the coefficients summarise how the response moves with each predictor after accounting for the others. Predictive, when the fitted model is used to forecast the response for new rows. This chapter covers both uses; Chapters 16 to 22 build on the same machinery for richer models.

TipWhere multiple regression fits in the triplet

Chapter 12 applied confirmatory tools to one variable, Chapter 13 to two variables, Chapter 14 to many variables without an identified response. Multiple regression picks one of those variables as a response and models it jointly from the rest.

15.2 The Multiple Regression Model

The model writes the expected value of y as a linear combination of k predictors plus an intercept: y equals beta0 plus beta1 x1 plus beta2 x2 plus up to betak xk, plus an error term with mean zero. OLS picks the coefficients that minimise the sum of squared residuals. Valid inference rests on a short list of assumptions: linear mean structure, independent observations, constant residual variance, approximately normal residuals, and no exact linear dependence among the predictors.

flowchart LR
    A[Pick response y] --> B[Pick predictors x1..xk]
    B --> C[Fit OLS with lm]
    C --> D[Read summary: coef, R2, F]
    D --> E[Residual diagnostics]
    E --> F{Assumptions met?}
    F -->|No| G[Transform, add terms, or change model]
    G --> C
    F -->|Yes| H[Refine: selection, interactions]
    H --> I[Predict and report]

WarningLinear here means linear in the coefficients

A multiple regression can carry polynomial terms, log transforms, and interactions, and still be linear in the sense that matters for OLS. What must not happen is a predictor that is an exact linear combination of the others.

15.3 Fitting and Reading the Output

In R the model is fitted with lm and inspected with summary. The summary prints the coefficient table, residual summary, multiple and adjusted R-squared, and the overall F test.

NoteFive things to read off the summary

Estimated coefficient and its standard error, the t-statistic and p-value for each predictor, residual standard error, multiple and adjusted R-squared, and the overall F statistic with its p-value. Each block answers a distinct question about the fit.

15.4 Interpreting Coefficients

A coefficient in multiple regression is the expected change in the response for a one-unit change in that predictor, with the other predictors held constant. This “held constant” clause is what separates multiple regression from a set of simple regressions: the same slope can shrink, grow, or even change sign when other predictors are introduced.

TipWhy the spend coefficient shrinks

Part of what looked like a spend effect in the simple model is carried by visits, which correlates with spend. The multiple-regression coefficient answers the cleaner question: how much does value move per unit of spend, at a fixed level of visits.

15.5 R-Squared, Adjusted R-Squared, and the F Test

R-squared reports the share of variance in y explained by the model. Adjusted R-squared penalises the addition of predictors that do not improve fit meaningfully. The overall F test compares the fitted model to an intercept-only null and answers whether the predictors jointly explain any variance (Fisher 1925).

NoteAdjusted R-squared is the fair comparison

R-squared never decreases when a predictor is added, even if the predictor is noise. Adjusted R-squared can decrease, which is why it is the right quantity to compare models with different numbers of predictors.

15.6 Residual Diagnostics

Before any coefficient is reported, the residuals are inspected. R’s plot on a fitted model draws four diagnostic panels: residuals vs fitted (linearity and equal variance), Q-Q plot (normality), scale-location (variance), and residuals vs leverage (influential points).

WarningDiagnostics beat the p-value table

A clean coefficient table on a poorly specified model is worse than a messy table on a well-specified one. Always read the four panels before trusting any coefficient.

15.7 Categorical Predictors

R turns a factor into a set of dummy variables automatically. One level is held out as the reference; each coefficient estimates the change in the response for that level relative to the reference, after accounting for the other predictors.

TipSetting the reference level

R uses alphabetical order by default. Use relevel(tier, ref = "Bronze") to force a business-meaningful baseline, which makes the coefficient table easier to read.

15.8 Interaction Terms

An interaction allows the effect of one predictor to depend on the level of another. In R, x1 * x2 expands to x1 + x2 + x1:x2 and adds one coefficient for the interaction. A full treatment of interactions as moderated effects is covered in Chapter 18.

NoteInterpreting an interaction coefficient

The interaction term is the difference in slope between levels. A positive interaction says the response rises faster with the predictor for that group than for the reference group. Add the main slope and the interaction to get the within-group slope.

15.9 Variable Selection

When several candidate predictors are available, model selection picks a subset that balances fit against complexity. Stepwise selection via step walks the candidate space using either AIC (Akaike 1974) or BIC (Schwarz 1978). BIC is more conservative and prefers smaller models.

WarningStepwise is a tool, not a truth

Stepwise selection is sensitive to small changes in the data and can inflate significance of the retained predictors. Treat it as a starting point for a model that the analyst then justifies on theory and diagnostics.

15.10 Multicollinearity Revisited

Chapter 14 introduced VIF as a diagnostic for linearly redundant predictors. Applied to a fitted regression, a VIF above 5 signals meaningful correlation with the other predictors, and above 10 signals a problem that usually needs action.

TipThree common remedies

Drop one of a correlated pair, combine them into a single index, or keep both and switch to a regularised regression (ridge or lasso) where collinearity is handled explicitly. The right choice depends on whether the coefficients or the predictions matter more.

15.11 Prediction and Prediction Intervals

A fitted model is used for prediction with predict. The interval = "confidence" option returns an interval for the mean response at a given predictor value; interval = "prediction" returns the wider interval for a single new observation.

NoteTwo intervals, two questions

The confidence interval answers “where is the average value for customers like these?” The prediction interval answers “where is a single new customer likely to land?” The prediction interval is always wider because it carries an extra source of variance.

15.12 Reporting a Multiple Regression

A regression report reuses the six-section skeleton from Chapters 11 to 14 and names the predictor set in place of the variable pair.

TipSix-section regression report
  1. Question and response, (2) predictor set and sample, (3) diagnostic panels and assumption checks, (4) coefficient table with R-squared and F, (5) effect sizes and confidence intervals, (6) business decision with caveats. Keeping the skeleton stable across Chapters 11 to 15 lets readers compare an inferential study and a predictive model at a glance.

15.13 Summary

Summary of multiple-regression tools introduced in this chapter
Concept Description
Model and Fit
Model form y = beta0 + beta1 x1 + ... + betak xk + error, linear in coefficients
lm and summary Fit with lm(y ~ ...); summary returns the full coefficient and fit table
R-squared, adjusted R-squared, F Variance share, adjusted share, and the overall model-vs-null test
Coefficient Interpretation
Continuous predictor Expected change in y per unit of x, others held constant
Categorical predictor (dummy) Factor level effect relative to the reference level
Interaction term Change in slope across groups; x1 * x2 expands to x1 + x2 + x1:x2
Diagnostics and Assumptions
Four-panel residual plots plot(fit) diagnostic panels for linearity, variance, normality, leverage
Normality of residuals Shapiro-Wilk on residuals; Q-Q plot is the primary visual check
Multicollinearity (VIF) VIF above 5 or 10 signals linear redundancy with other predictors
Model Selection
Stepwise selection step() walks predictor sets by information criterion
AIC and BIC Akaike and Bayesian criteria for model-vs-model comparison
Prediction and Reporting
predict with confidence Interval for the average response at given predictor values
predict with prediction Interval for a single new observation; always wider than confidence
Holdout sanity check Out-of-sample RMSE to confirm predictive generalisation
Six-section regression report Question, predictors, diagnostic, fit, effect, decision