15 Multiple Regression

15.1 Multiple Regression in Context

Multiple regression extends the simple linear regression of Chapter 13 to more than one predictor. A single numeric response is modelled as a linear combination of several predictors, and the coefficients are estimated by ordinary least squares (Gauss 1809). The regression idea itself goes back to Galton’s study of inherited stature (Galton 1886).

Two roles of a regression model

Descriptive, when the coefficients summarise how the response moves with each predictor after accounting for the others. Predictive, when the fitted model is used to forecast the response for new rows. This chapter covers both uses; Chapters 16 to 22 build on the same machinery for richer models.

Where multiple regression fits in the triplet

Chapter 12 applied confirmatory tools to one variable, Chapter 13 to two variables, Chapter 14 to many variables without an identified response. Multiple regression picks one of those variables as a response and models it jointly from the rest.

15.2 The Multiple Regression Model

The model writes the expected value of y as a linear combination of k predictors plus an intercept: y equals beta0 plus beta1 x1 plus beta2 x2 plus up to betak xk, plus an error term with mean zero. OLS picks the coefficients that minimise the sum of squared residuals. Valid inference rests on a short list of assumptions: linear mean structure, independent observations, constant residual variance, approximately normal residuals, and no exact linear dependence among the predictors.

flowchart LR
    A[Pick response y] --> B[Pick predictors x1..xk]
    B --> C[Fit OLS with lm]
    C --> D[Read summary: coef, R2, F]
    D --> E[Residual diagnostics]
    E --> F{Assumptions met?}
    F -->|No| G[Transform, add terms, or change model]
    G --> C
    F -->|Yes| H[Refine: selection, interactions]
    H --> I[Predict and report]
    classDef default fill:#003366,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

Linear here means linear in the coefficients

A multiple regression can carry polynomial terms, log transforms, and interactions, and still be linear in the sense that matters for OLS. What must not happen is a predictor that is an exact linear combination of the others.

15.3 Fitting and Reading the Output

In R the model is fitted with lm and inspected with summary. The summary prints the coefficient table, residual summary, multiple and adjusted R-squared, and the overall F test.

Try here

Five things to read off the summary

Estimated coefficient and its standard error, the t-statistic and p-value for each predictor, residual standard error, multiple and adjusted R-squared, and the overall F statistic with its p-value. Each block answers a distinct question about the fit.

15.4 Interpreting Coefficients

A coefficient in multiple regression is the expected change in the response for a one-unit change in that predictor, with the other predictors held constant. This “held constant” clause is what separates multiple regression from a set of simple regressions: the same slope can shrink, grow, or even change sign when other predictors are introduced.

Try here

Why the spend coefficient shrinks

Part of what looked like a spend effect in the simple model is carried by visits, which correlates with spend. The multiple-regression coefficient answers the cleaner question: how much does value move per unit of spend, at a fixed level of visits.

15.5 R-Squared, Adjusted R-Squared, and the F Test

R-squared reports the share of variance in y explained by the model. Adjusted R-squared penalises the addition of predictors that do not improve fit meaningfully. The overall F test compares the fitted model to an intercept-only null and answers whether the predictors jointly explain any variance (Fisher 1925).

Try here

Adjusted R-squared is the fair comparison

R-squared never decreases when a predictor is added, even if the predictor is noise. Adjusted R-squared can decrease, which is why it is the right quantity to compare models with different numbers of predictors.

15.6 Residual Diagnostics

Before any coefficient is reported, the residuals are inspected. R’s plot on a fitted model draws four diagnostic panels: residuals vs fitted (linearity and equal variance), Q-Q plot (normality), scale-location (variance), and residuals vs leverage (influential points).

Try here

Diagnostics beat the p-value table

A clean coefficient table on a poorly specified model is worse than a messy table on a well-specified one. Always read the four panels before trusting any coefficient.

15.7 Categorical Predictors

R turns a factor into a set of dummy variables automatically. One level is held out as the reference; each coefficient estimates the change in the response for that level relative to the reference, after accounting for the other predictors.

Try here

Setting the reference level

R uses alphabetical order by default. Use relevel(tier, ref = "Bronze") to force a business-meaningful baseline, which makes the coefficient table easier to read.

15.8 Interaction Terms

An interaction allows the effect of one predictor to depend on the level of another. In R, x1 * x2 expands to x1 + x2 + x1:x2 and adds one coefficient for the interaction. A full treatment of interactions as moderated effects is covered in Chapter 18.

Try here

Interpreting an interaction coefficient

The interaction term is the difference in slope between levels. A positive interaction says the response rises faster with the predictor for that group than for the reference group. Add the main slope and the interaction to get the within-group slope.

15.9 Variable Selection

When several candidate predictors are available, model selection picks a subset that balances fit against complexity. Stepwise selection via step walks the candidate space using either AIC (Akaike 1974) or BIC (Schwarz 1978). BIC is more conservative and prefers smaller models.

Try here

Stepwise is a tool, not a truth

Stepwise selection is sensitive to small changes in the data and can inflate significance of the retained predictors. Treat it as a starting point for a model that the analyst then justifies on theory and diagnostics.

15.10 Multicollinearity Revisited

Chapter 14 introduced VIF as a diagnostic for linearly redundant predictors. Applied to a fitted regression, a VIF above 5 signals meaningful correlation with the other predictors, and above 10 signals a problem that usually needs action.

Try here

Three common remedies

Drop one of a correlated pair, combine them into a single index, or keep both and switch to a regularised regression (ridge or lasso) where collinearity is handled explicitly. The right choice depends on whether the coefficients or the predictions matter more.

15.11 Prediction and Prediction Intervals

A fitted model is used for prediction with predict. The interval = "confidence" option returns an interval for the mean response at a given predictor value; interval = "prediction" returns the wider interval for a single new observation.

Try here

Two intervals, two questions

The confidence interval answers “where is the average value for customers like these?” The prediction interval answers “where is a single new customer likely to land?” The prediction interval is always wider because it carries an extra source of variance.

15.12 Reporting a Multiple Regression

A regression report reuses the six-section skeleton from Chapters 11 to 14 and names the predictor set in place of the variable pair.

Six-section regression report

Question and response, (2) predictor set and sample, (3) diagnostic panels and assumption checks, (4) coefficient table with R-squared and F, (5) effect sizes and confidence intervals, (6) business decision with caveats. Keeping the skeleton stable across Chapters 11 to 15 lets readers compare an inferential study and a predictive model at a glance.

Summary

Concept	Description
Model and Output
Multiple Regression	Predict a continuous outcome from many predictors at once
Linear in Coefficients	Linear means linear in coefficients, not in predictors
Reading the Output	Five quantities to read off the lm summary
Interpretation
Coefficient Interpretation	Each coefficient holds others constant — partial effect
R-Squared	Proportion of variance explained by the model
Adjusted R-Squared	R-squared adjusted for model complexity, fairer to compare models
Residual Diagnostics	Plots of residuals diagnose linearity, homoscedasticity, normality
Categorical Predictors	Dummy-coded factors with one reference level