6  Data Types by Levels of Measurement

6.1 Why Levels of Measurement Matter

The level of measurement of a variable determines which statistics, visualisations, and models are legitimate. A variable that looks numeric is not automatically suitable for arithmetic: PIN codes, customer IDs, and the numeric codes of satisfaction ratings are stored as numbers but behave nothing like revenue in rupees. Applying the wrong statistic produces results that look plausible, pass most sanity checks, and mislead every downstream decision.

NoteFrom measurement theory to analytics

The theory of measurement levels was formalised by the psychologist S. S. Stevens in a 1946 paper in Science. Stevens’ four levels (nominal, ordinal, interval, ratio) remain the working taxonomy in statistics, business analytics, and the social sciences. Every statistical method carries an implicit assumption about the level of its inputs; understanding that assumption is the first step to using the method correctly.

ImportantThe practical consequence

Level of measurement is not a philosophical footnote. It chooses the statistic (mean versus median versus mode), the chart (bar versus histogram), and the model (linear regression versus logistic versus ordinal). The same data, classified at different levels, will answer different questions.

6.2 Stevens’ Four Levels

Stevens’ 1946 classification arranges measurement into four nested levels. Each level admits the operations of the level below plus one new operation. Moving up the hierarchy adds mathematical structure and expands the set of permissible statistics.

Level Equality Order Equal intervals True zero Indian business example
Nominal Yes No No No State code (TN, KA, MH)
Ordinal Yes Yes No No CSAT rating (Low, Medium, High)
Interval Yes Yes Yes No Temperature in °C at warehouse
Ratio Yes Yes Yes Yes Monthly revenue in ₹

6.3 Nominal Scale

The nominal scale classifies observations into categories with no inherent order. The only valid operation is checking whether two values are equal.

NoteTypical nominal variables

State of residence (TN, KA, MH, DL), product category (electronics, apparel, grocery), payment method (UPI, credit card, net banking, cash-on-delivery), account status (active, dormant, closed), blood group, gender. The categories are mutually exclusive and collectively exhaustive; the numeric codes sometimes assigned to them (1, 2, 3) are arbitrary labels, not measurements.

TipPermissible operations

Frequency counts, mode, chi-square tests of association, proportions. Reporting a mean over numeric codes is meaningless: the “average payment method” has no interpretation. In R, nominal variables are represented using factor() without the ordered argument.

6.4 Ordinal Scale

The ordinal scale adds order to the nominal scale. Values can be ranked, but the distance between consecutive ranks is not guaranteed to be equal.

NoteTypical ordinal variables

Five-point Likert items (Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree), credit-risk bands (AAA, AA, A, BBB, …), NPS buckets (Detractor, Passive, Promoter), customer tiers (Bronze, Silver, Gold, Platinum), education level (undergraduate, postgraduate, doctorate). The gap between “Agree” and “Strongly Agree” is not necessarily the same as the gap between “Neutral” and “Agree”.

TipPermissible operations

Median, mode, percentiles, rank-based tests (Mann-Whitney, Kruskal-Wallis, Spearman correlation). Means are technically not valid, although in practice a mean of Likert items is often reported and treated as approximately interval. In R, ordinal variables are represented using factor(..., ordered = TRUE).

6.5 Interval Scale

The interval scale adds equal spacing between adjacent values, so differences are meaningful. It lacks a true zero, which means ratios are not meaningful.

NoteTypical interval variables

Temperature in Celsius or Fahrenheit, calendar year (2025 CE), IQ score, credit score on a bounded band (300 to 900 in the CIBIL range when treated as interval). The difference between 20°C and 25°C equals the difference between 30°C and 35°C, but 20°C is not “twice as hot” as 10°C because 0°C is a convention, not an absence of temperature.

TipPermissible operations

Mean, standard deviation, Pearson correlation, t-tests, linear regression. Ratios and coefficients of variation are not valid because the zero point is arbitrary. In R, interval variables are stored as numeric.

6.6 Ratio Scale

The ratio scale adds a true zero that represents the absence of the quantity. All arithmetic operations are meaningful, including ratios.

NoteTypical ratio variables

Monthly revenue in ₹, units sold, web-session duration in seconds, customer age in years, number of app sessions per week, distance travelled, inventory count. If a store’s sales doubled from ₹5 lakh to ₹10 lakh, the ratio is meaningful because ₹0 denotes no sales.

TipPermissible operations

All arithmetic, geometric mean, harmonic mean, coefficient of variation, log transforms, any parametric or non-parametric inferential test, any regression model. In R, ratio variables are stored as numeric (or integer when discrete). The distinction between interval and ratio seldom affects routine analytics, but matters for metrics like growth rates and elasticities that depend on ratios.

6.7 The Hierarchy of Measurement Levels

Each level includes the operations of the level below and adds one more. A variable measured at the ratio level can always be degraded to a lower level by discarding information (for example, binning revenue into “Low/Medium/High”), but the reverse is never possible.

flowchart LR
  A[Nominal<br/>equality only] --> B[Ordinal<br/>adds order]
  B --> C[Interval<br/>adds equal spacing]
  C --> D[Ratio<br/>adds true zero]

ImportantInformation loss is one-way

Always collect at the highest level the measurement process supports and bin later if needed. A rating captured as an integer from 0 to 10 can be analysed as continuous, ordinal, or nominal; a rating captured only as “Good/Neutral/Bad” cannot be recovered.

6.8 Discrete and Continuous Variables

Discrete and continuous is an orthogonal distinction that cuts across Stevens’ scheme. Discrete variables take isolated values (typically integers); continuous variables can take any value within an interval.

NoteTypical discrete variables

Count of orders placed, number of defects per batch, number of employees in a branch, number of logins per day. Discrete variables are usually modelled with count-data distributions (Poisson, negative binomial).

NoteTypical continuous variables

Revenue, time, temperature, distance, weight. Continuous variables are usually modelled with continuous distributions (normal, log-normal, gamma). The distinction matters at the modelling stage because the wrong distributional assumption can produce biased confidence intervals and invalid tests.

6.9 Qualitative and Quantitative Variables

An older and coarser classification splits variables into qualitative (categorical) and quantitative (numerical). Qualitative variables correspond roughly to nominal and ordinal; quantitative variables correspond to interval and ratio. The split is convenient in everyday language but too blunt for analytics, where the distinction between ordinal and interval, or between interval and ratio, often decides which technique is appropriate.

6.10 Permissible Statistics by Level

Level Central tendency Dispersion Comparison Example inferential test
Nominal Mode - Frequency, proportion Chi-square
Ordinal Median, mode Range, IQR Rank Mann-Whitney, Kruskal-Wallis
Interval Mean, median, mode SD, variance Difference t-test, ANOVA
Ratio All of the above plus geometric and harmonic mean SD, CV Difference and ratio All of the above plus log-linear models

6.11 Representing Levels in R

The level of measurement should be encoded in the object type so that downstream functions treat it correctly.

Calling mean() on an unordered factor would throw an error; calling sum() on an ordered factor would also fail. The object type is itself a form of documentation that protects later analysis from invalid operations.

6.12 Common Mistakes

WarningFour frequent errors
  1. Averaging a single Likert item as if it were ratio. An “average satisfaction of 3.4” on a 1-5 scale treats the ordinal gaps as equal when they are not. Report the median, the distribution, or a validated composite score. 2. Computing ratios on interval data. Saying 30°C is “twice as hot” as 15°C is meaningless; rework the question on the Kelvin scale if a ratio is really needed. 3. Coding a category as an integer and fitting linear regression on it. Using 1 = Bronze, 2 = Silver, 3 = Gold, 4 = Platinum and treating the code as continuous imposes arbitrary equal spacing. Use dummy variables or ordinal regression. 4. Leaving ID columns as numeric. Customer ID, PIN code, and account number look numeric but are nominal. Convert them to character or factor so they are never fed into arithmetic or into a regression as a predictor.

6.13 Implications for Modelling

ImportantLevel of measurement selects the model family

The outcome’s level of measurement chooses the model. A binary nominal outcome calls for logistic regression. A multi-category nominal outcome calls for multinomial logistic regression. An ordinal outcome (satisfaction band, credit tier) calls for ordinal regression (proportional-odds or similar). A discrete count outcome (number of claims, number of logins) calls for Poisson or negative binomial regression. A continuous ratio outcome (revenue, duration) calls for linear regression, possibly after a log transform. Selecting the right family is not a stylistic choice; it affects every inference drawn from the model.

6.14 Summary

Summary of measurement-level concepts introduced in this chapter
Concept Description
Levels of Measurement
Nominal Unordered categories; only equality is meaningful
Ordinal Ordered categories; gaps not guaranteed equal
Interval Equal spacing; differences meaningful; zero is arbitrary
Ratio True zero; ratios and all arithmetic are meaningful
Key Operations
Equality Two values can be compared for sameness
Order Values can be ranked from low to high
Equal intervals Differences between adjacent values are equal
True zero Zero denotes absence of the quantity
Variable Kinds
Discrete Isolated values, usually integer counts
Continuous Any value within an interval; real-valued
Legacy Grouping
Qualitative Legacy label for nominal and ordinal
Quantitative Legacy label for interval and ratio
R Representations
factor() R object for nominal variables
factor(ordered = TRUE) R object for ordinal variables
numeric / integer R object for interval and ratio variables

The level of measurement is the single most consequential property of a variable at the point of analysis. Attaching the right level to each variable early (in the data dictionary, in the ingestion code, and in the R object type) prevents a large class of silent errors later.