16 Logistic Regression

16.1 Logistic Regression in Context

Logistic regression models a binary outcome: churn or stay, default or repay, click or skip. The linear regression of Chapter 15 is not directly applicable because its predicted values are not constrained to the zero-to-one interval that a probability needs. Logistic regression keeps the linear-in-coefficients structure but passes the linear predictor through a logit link so that the output is always a probability (Berkson 1944; Nelder and Wedderburn 1972).

Binary outcomes are everywhere in business analytics

Whether a customer churns, whether a loan is repaid, whether a ticket is resolved on first contact, whether a prospect converts. Each is a zero-or-one question, and each calls for a model whose output is a probability.

Logistic regression is a GLM

It is one member of the generalised linear model family. The family specifies a distribution for the response (binomial here) and a link function that connects the linear predictor to the mean of that distribution (the logit).

16.2 The Logistic Model

The model writes the log-odds of the positive outcome as a linear combination of predictors: log(p / (1 minus p)) equals beta0 plus beta1 x1 plus up to betak xk. Inverting the logit returns the probability p. Coefficients are estimated by maximum likelihood rather than ordinary least squares.

flowchart LR
    A[Pick binary response y] --> B[Pick predictors x1..xk]
    B --> C[Fit glm family binomial]
    C --> D[Read summary: coef, deviance]
    D --> E[Interpret as log-odds or odds ratios]
    E --> F[Classification metrics and ROC]
    F --> G{Fit acceptable?}
    G -->|No| H[Refine predictors or threshold]
    H --> C
    G -->|Yes| I[Predict probabilities and report]
    classDef default fill:#2e4057,color:#ffffff,stroke:#ff9933,stroke-width:3px,rx:10px,ry:10px;

Log-odds live on a different scale

A coefficient of 0.5 in logistic regression does not say the probability rises by 0.5 per unit of x. It says the log-odds rise by 0.5, which means the odds are multiplied by exp(0.5), roughly 1.65. The probability change depends on where on the curve the predictor sits.

16.3 Fitting with glm

R fits the model with glm and family = binomial. The summary prints a coefficient table with z-statistics (not t), the null and residual deviance, and the AIC.

Try here

What to read off the summary

Each coefficient with its standard error and z-statistic, the null deviance (how well an intercept-only model fits), the residual deviance (how well the fitted model fits), the degrees of freedom, and the AIC. A large drop from null to residual deviance is the logistic analogue of a high R-squared.

16.4 Log-Odds and Odds Ratios

Raw coefficients are on the log-odds scale. Exponentiating a coefficient gives the odds ratio: the multiplicative change in odds per unit of the predictor. Odds ratios are easier to report to a business audience than log-odds.

Try here

Reading an odds ratio

An odds ratio of 1 means no effect. Above 1 means higher odds of the positive outcome per unit of the predictor; below 1 means lower odds. For tenure the odds ratio is below 1, which matches the intuition that longer-tenured customers are less likely to churn.

16.5 Deviance, AIC, and Pseudo R-Squared

Logistic regression has no direct R-squared. The closest analogues are the drop from null to residual deviance and pseudo R-squared measures such as McFadden’s (McFadden 1974).

Try here

Three fit quantities, three questions

Deviance drop is the absolute improvement over the null. AIC is the comparable criterion when competing models have different predictors. McFadden’s R-squared puts the deviance drop on a zero-to-one scale for interpretation; McFadden’s own guideline is that 0.2 to 0.4 already indicates an excellent fit, well below the thresholds that a linear-regression R-squared would suggest.

16.6 Categorical Predictors and Interactions

Factors expand into dummy variables exactly as in Chapter 15. Interactions use the same x1 * x2 syntax. The interpretation changes only in that everything sits on the log-odds scale.

Try here

Interaction on the log-odds scale

An interaction coefficient says how the slope of tenure differs between tier levels on the log-odds scale. Exponentiating gives the ratio of odds ratios across groups. Chapter 18 treats this idea as moderation in depth.

16.7 Classification and the Confusion Matrix

Once probabilities are in hand, a classification rule turns them into a decision by cutting at a threshold, often 0.5. The resulting confusion matrix tabulates true positives, false positives, true negatives, and false negatives, which feed accuracy, precision, recall, and F1.

Try here

The threshold is a business decision

The 0.5 default only makes sense when false positives and false negatives are equally costly. Raise the threshold to reduce false positives, lower it to catch more positives. The right value is set by the cost of each kind of error, not by the model.

16.8 ROC Curve and AUC

The ROC curve sweeps the threshold from 0 to 1 and plots true-positive rate against false-positive rate. The area under the curve (AUC) summarises classifier quality across all thresholds: 0.5 is chance, 1 is perfect.

Try here

AUC interpretation

AUC is the probability that a randomly chosen positive case receives a higher predicted probability than a randomly chosen negative case. Values around 0.7 are acceptable, 0.8 good, above 0.9 excellent in most business settings.

16.9 Residual Diagnostics

Residuals in logistic regression are less informative row by row than in linear regression because each observation is a zero or one. The standard practice is to work with binned residuals (average residual in each prediction bin) and to check leverage for extreme rows.

Try here

What a clean binned plot looks like

Residuals scattered around zero across the probability range. Systematic drift at either end suggests a missing predictor or a non-linear effect of an existing one.

16.10 Variable Selection and Multicollinearity

Stepwise selection and VIF both work on glm the same way they worked on lm in Chapter 15. AIC is the default comparison criterion; BIC is available with k = log(n).

Try here

Selection plus collinearity is a double filter

A stepwise routine that drops x2 when x1 is in the model is doing roughly what VIF diagnostics would suggest. Reviewing both keeps the selected model defensible.

16.11 Predicting New Cases

predict.glm returns log-odds by default; use type = "response" for probabilities. A class label is produced by thresholding the probability.

Try here

Two-stage output is easier to defend

Report the probability and the decision separately. A stakeholder can then see the model’s confidence and the rule that translated it into an action.

16.12 Reporting a Logistic Regression

A logistic-regression report reuses the six-section skeleton from Chapters 11 to 15 and swaps the coefficient table for one that includes both raw coefficients and odds ratios.

Six-section logistic report

Question and binary response, (2) predictor set and sample, (3) diagnostic view and binned residuals, (4) coefficient table with odds ratios and confidence intervals, (5) classification metrics at the chosen threshold plus AUC, (6) business decision with threshold rationale. The skeleton lines up directly with Chapter 15’s regression report, which makes predictive and probabilistic studies comparable.

Summary

Concept	Description
Model
Logistic Regression	Predict a binary outcome from one or more predictors
GLM Family	Generalised Linear Model with logit link and Bernoulli family
Logistic Function	Maps real-line predictor scores to probabilities in (0, 1)
Interpretation and Fit
Log-Odds and Odds Ratios	Coefficients in log-odds; exp(coef) is the odds ratio
Deviance and AIC	Lower deviance and AIC indicate better fit, balanced for complexity
Pseudo R-Squared	McFadden, Cox-Snell, Nagelkerke — none equals OLS R-squared
Interactions	Multiplicative effect on the log-odds scale, not the response
Classification
Confusion Matrix	True/false positives/negatives and derived rates