19 Implementation of Methods with R

19.1 The Implementation Workflow

Chapters 15 to 18 each introduced one method. This chapter folds them into a single end-to-end pipeline in R: frame the problem, split the data, prepare the predictors, fit and compare candidate models, validate with resampling, check diagnostics, evaluate on a holdout, and package the analysis as a reusable function. The focus here is on the glue between steps, not the methods themselves.

flowchart LR
    A[Frame problem] --> B[Train and test split]
    B --> C[Prepare numeric]
    C --> D[Prepare categorical]
    D --> E[Fit candidates]
    E --> F[k-fold CV]
    F --> G[Diagnostics]
    G --> H[Holdout evaluation]
    H --> I[Package and hand over]
    classDef default fill:#2a4d69,color:#ffffff,stroke:#ffcc00,stroke-width:3px,rx:10px,ry:10px;

Why a workflow, not a recipe

The same eight boxes apply to a regression, a logistic model, or a moderated model. Making the pipeline explicit (rather than improvising from one project to the next) lets results be reviewed, reproduced, and audited without rereading code from scratch.

19.2 Problem Framing

The first decision is not a coding decision: it is whether the response is continuous (Chapter 15), binary (Chapter 16), measured through a mechanism (Chapter 17), or contingent on a boundary variable (Chapter 18). The data-type of Y, the theory behind the predictors, and the question the business wants answered together fix the method before a single line of R runs.

A simple selector

Continuous Y and no mechanism claim: linear regression. Binary Y: logistic regression. Continuous Y with a hypothesised intermediate variable on the causal path: mediation. Continuous or binary Y with a hypothesised condition under which the effect holds: moderation. Chapters 15-18 give the tools; the business question picks the chapter.

19.3 Train and Test Split

A holdout is non-negotiable in predictive modelling (Hastie, Tibshirani and Friedman 2009). The standard split reserves 70 to 80 percent for training and the remainder for an honest evaluation. Random indexing in base R is enough; no extra package is required.

Try here

Set the seed and record the split

The same seed plus the same data reproduces the split exactly. Record the seed in the report so a reviewer can reconstruct the training and test sets without guessing.

19.4 Preparing Numeric Predictors

Scaling is almost always worth the small cost: it makes coefficients comparable in magnitude, stabilises numerical optimisation, and removes scale-induced artefacts in regularised or distance-based methods. Critically, the scaling parameters must be learned on the training data and applied to the test set unchanged.

Try here

Leak-free preparation

Fitting the scaler on the combined data before the split leaks information from the test set into training. The means and standard deviations must come from the training split only.

19.5 Preparing Categorical Predictors

Factor handling has two recurring pitfalls: a reference level chosen alphabetically that does not match the business default, and rare levels with too few rows to estimate reliably. Both are fixed with relevel() and explicit pooling before any model is fit.

Try here

The reference level is a design choice

All subsequent dummy coefficients are differences from the reference. Choosing a business-meaningful reference (the default tier, the control arm, the baseline channel) makes the coefficient table far easier to read.

19.6 Candidate Models and AIC Comparison

Rather than committing to a single functional form at the start, fit two or three defensible candidates and compare them on AIC (Akaike 1974). AIC balances fit against complexity and is directly comparable across nested and non-nested candidates with the same response.

Try here

Lower AIC is better, but the gap must be meaningful

A reduction in AIC smaller than roughly 2 is weak evidence. A reduction of 10 or more is substantial. Pair AIC with a test-set metric before choosing a winner.

19.7 k-Fold Cross-Validation

AIC and adjusted R-squared are training-set criteria. To estimate out-of-sample error before touching the holdout, partition the training data into k folds, fit on k minus one folds, and predict on the held-out fold (Stone 1974). Averaging the k out-of-fold errors gives a stable CV estimate.

Try here

Five folds is a reasonable default

Five-fold CV is fast enough to rerun during exploration and stable enough to compare nearby candidates. Ten folds reduces variance at roughly double the cost; leave-one-out is rarely worth the compute for typical business datasets.

19.8 Diagnostics of the Chosen Model

Once a candidate is picked, the four-panel diagnostic plot from Chapter 15 is the first check: residual-versus-fitted for linearity and variance, Q-Q for normality of residuals, scale-location for variance, and leverage for influence. For a glm, swap in binned residuals (Chapter 16).

Try here

Look at all four panels before deciding

A clean residual-versus-fitted panel does not excuse a heavy-tailed Q-Q. Pattern in any one panel is a reason to reconsider the functional form, add a predictor, or switch to a more suitable GLM.

19.9 Holdout Evaluation

The holdout is the final, honest measure of how the model will perform on data it has not seen. Predict on the test split once, compute the appropriate metric (RMSE for continuous Y, accuracy or AUC for binary Y), and report the number alongside the training and CV estimates for context.

Try here

A large train-to-test gap is a warning sign

A train RMSE far below the test RMSE suggests overfitting: the model has memorised training noise. The fix is fewer predictors, regularisation, or more data, not a nicer diagnostic plot.

19.10 Packaging as a Reusable Function

Once the pipeline is working, wrap it in a function. The function takes raw data and a formula, performs the split, prepares the predictors, fits the model, returns the fit along with metrics. It turns a one-off notebook into an audited routine that can be rerun on next month’s extract.

Try here

Functions turn notebooks into pipelines

Once the workflow is a function, it can be called with a different dataset, a different formula, or a different seed without copy-and-paste. Bugs get fixed in one place. Argument defaults document the choices the team standardised on.

19.11 Reporting and Handover

The deliverable is not just the model object; it is the model plus everything a reviewer or a downstream team needs to run it again.

Handover checklist

Data source and extraction date, (2) response variable and predictor definitions, (3) preparation steps including scaler parameters and pooled levels, (4) final formula and coefficient table, (5) CV and holdout metrics, (6) diagnostic plots, (7) the fitting function and its seed, (8) known caveats and recommended refresh cadence. A short report that hits these eight points makes the model auditable and re-runnable by someone who was not in the room.

Summary

Concept	Description
Workflow and Setup
Workflow	Reusable pipeline, not a one-off recipe
Problem Framing	State the question and the success metric before modelling
Train-Test Split	Set the seed and record the split for reproducibility
Preparation
Numeric Preparation	Leak-free standardisation and missingness handling
Categorical Preparation	Choose reference levels and encode factors carefully
Comparison
Candidate Models	Compare nested and non-nested models on shared validation
AIC Comparison	Lower AIC is better; the gap must be meaningful
k-Fold Cross-Validation	Five folds is a reasonable default for moderate samples