flowchart LR
A[Pick response y] --> B[Pick predictors x1..xk]
B --> C[Fit OLS with lm]
C --> D[Read summary: coef, R2, F]
D --> E[Residual diagnostics]
E --> F{Assumptions met?}
F -->|No| G[Transform, add terms, or change model]
G --> C
F -->|Yes| H[Refine: selection, interactions]
H --> I[Predict and report]
15 Multiple Regression
15.1 Multiple Regression in Context
Multiple regression extends the simple linear regression of Chapter 13 to more than one predictor. A single numeric response is modelled as a linear combination of several predictors, and the coefficients are estimated by ordinary least squares (Gauss 1809). The regression idea itself goes back to Galton’s study of inherited stature (Galton 1886).
Descriptive, when the coefficients summarise how the response moves with each predictor after accounting for the others. Predictive, when the fitted model is used to forecast the response for new rows. This chapter covers both uses; Chapters 16 to 22 build on the same machinery for richer models.
Chapter 12 applied confirmatory tools to one variable, Chapter 13 to two variables, Chapter 14 to many variables without an identified response. Multiple regression picks one of those variables as a response and models it jointly from the rest.
15.2 The Multiple Regression Model
The model writes the expected value of y as a linear combination of k predictors plus an intercept: y equals beta0 plus beta1 x1 plus beta2 x2 plus up to betak xk, plus an error term with mean zero. OLS picks the coefficients that minimise the sum of squared residuals. Valid inference rests on a short list of assumptions: linear mean structure, independent observations, constant residual variance, approximately normal residuals, and no exact linear dependence among the predictors.
A multiple regression can carry polynomial terms, log transforms, and interactions, and still be linear in the sense that matters for OLS. What must not happen is a predictor that is an exact linear combination of the others.
15.3 Fitting and Reading the Output
In R the model is fitted with lm and inspected with summary. The summary prints the coefficient table, residual summary, multiple and adjusted R-squared, and the overall F test.
Estimated coefficient and its standard error, the t-statistic and p-value for each predictor, residual standard error, multiple and adjusted R-squared, and the overall F statistic with its p-value. Each block answers a distinct question about the fit.
15.4 Interpreting Coefficients
A coefficient in multiple regression is the expected change in the response for a one-unit change in that predictor, with the other predictors held constant. This “held constant” clause is what separates multiple regression from a set of simple regressions: the same slope can shrink, grow, or even change sign when other predictors are introduced.
Part of what looked like a spend effect in the simple model is carried by visits, which correlates with spend. The multiple-regression coefficient answers the cleaner question: how much does value move per unit of spend, at a fixed level of visits.
15.5 R-Squared, Adjusted R-Squared, and the F Test
R-squared reports the share of variance in y explained by the model. Adjusted R-squared penalises the addition of predictors that do not improve fit meaningfully. The overall F test compares the fitted model to an intercept-only null and answers whether the predictors jointly explain any variance (Fisher 1925).
R-squared never decreases when a predictor is added, even if the predictor is noise. Adjusted R-squared can decrease, which is why it is the right quantity to compare models with different numbers of predictors.
15.6 Residual Diagnostics
Before any coefficient is reported, the residuals are inspected. R’s plot on a fitted model draws four diagnostic panels: residuals vs fitted (linearity and equal variance), Q-Q plot (normality), scale-location (variance), and residuals vs leverage (influential points).
A clean coefficient table on a poorly specified model is worse than a messy table on a well-specified one. Always read the four panels before trusting any coefficient.
15.7 Categorical Predictors
R turns a factor into a set of dummy variables automatically. One level is held out as the reference; each coefficient estimates the change in the response for that level relative to the reference, after accounting for the other predictors.
R uses alphabetical order by default. Use relevel(tier, ref = "Bronze") to force a business-meaningful baseline, which makes the coefficient table easier to read.
15.8 Interaction Terms
An interaction allows the effect of one predictor to depend on the level of another. In R, x1 * x2 expands to x1 + x2 + x1:x2 and adds one coefficient for the interaction. A full treatment of interactions as moderated effects is covered in Chapter 18.
The interaction term is the difference in slope between levels. A positive interaction says the response rises faster with the predictor for that group than for the reference group. Add the main slope and the interaction to get the within-group slope.
15.9 Variable Selection
When several candidate predictors are available, model selection picks a subset that balances fit against complexity. Stepwise selection via step walks the candidate space using either AIC (Akaike 1974) or BIC (Schwarz 1978). BIC is more conservative and prefers smaller models.
Stepwise selection is sensitive to small changes in the data and can inflate significance of the retained predictors. Treat it as a starting point for a model that the analyst then justifies on theory and diagnostics.
15.10 Multicollinearity Revisited
Chapter 14 introduced VIF as a diagnostic for linearly redundant predictors. Applied to a fitted regression, a VIF above 5 signals meaningful correlation with the other predictors, and above 10 signals a problem that usually needs action.
Drop one of a correlated pair, combine them into a single index, or keep both and switch to a regularised regression (ridge or lasso) where collinearity is handled explicitly. The right choice depends on whether the coefficients or the predictions matter more.
15.11 Prediction and Prediction Intervals
A fitted model is used for prediction with predict. The interval = "confidence" option returns an interval for the mean response at a given predictor value; interval = "prediction" returns the wider interval for a single new observation.
The confidence interval answers “where is the average value for customers like these?” The prediction interval answers “where is a single new customer likely to land?” The prediction interval is always wider because it carries an extra source of variance.
15.12 Reporting a Multiple Regression
A regression report reuses the six-section skeleton from Chapters 11 to 14 and names the predictor set in place of the variable pair.
- Question and response, (2) predictor set and sample, (3) diagnostic panels and assumption checks, (4) coefficient table with R-squared and F, (5) effect sizes and confidence intervals, (6) business decision with caveats. Keeping the skeleton stable across Chapters 11 to 15 lets readers compare an inferential study and a predictive model at a glance.
15.13 Summary
| Concept | Description |
|---|---|
| Model and Fit | |
| Model form | y = beta0 + beta1 x1 + ... + betak xk + error, linear in coefficients |
| lm and summary | Fit with lm(y ~ ...); summary returns the full coefficient and fit table |
| R-squared, adjusted R-squared, F | Variance share, adjusted share, and the overall model-vs-null test |
| Coefficient Interpretation | |
| Continuous predictor | Expected change in y per unit of x, others held constant |
| Categorical predictor (dummy) | Factor level effect relative to the reference level |
| Interaction term | Change in slope across groups; x1 * x2 expands to x1 + x2 + x1:x2 |
| Diagnostics and Assumptions | |
| Four-panel residual plots | plot(fit) diagnostic panels for linearity, variance, normality, leverage |
| Normality of residuals | Shapiro-Wilk on residuals; Q-Q plot is the primary visual check |
| Multicollinearity (VIF) | VIF above 5 or 10 signals linear redundancy with other predictors |
| Model Selection | |
| Stepwise selection | step() walks predictor sets by information criterion |
| AIC and BIC | Akaike and Bayesian criteria for model-vs-model comparison |
| Prediction and Reporting | |
| predict with confidence | Interval for the average response at given predictor values |
| predict with prediction | Interval for a single new observation; always wider than confidence |
| Holdout sanity check | Out-of-sample RMSE to confirm predictive generalisation |
| Six-section regression report | Question, predictors, diagnostic, fit, effect, decision |