flowchart LR A[Nominal<br/>equality only] --> B[Ordinal<br/>adds order] B --> C[Interval<br/>adds equal spacing] C --> D[Ratio<br/>adds true zero]
6 Data Types by Levels of Measurement
6.1 Why Levels of Measurement Matter
The level of measurement of a variable determines which statistics, visualisations, and models are legitimate. A variable that looks numeric is not automatically suitable for arithmetic: PIN codes, customer IDs, and the numeric codes of satisfaction ratings are stored as numbers but behave nothing like revenue in rupees. Applying the wrong statistic produces results that look plausible, pass most sanity checks, and mislead every downstream decision.
The theory of measurement levels was formalised by the psychologist S. S. Stevens in a 1946 paper in Science. Stevens’ four levels (nominal, ordinal, interval, ratio) remain the working taxonomy in statistics, business analytics, and the social sciences. Every statistical method carries an implicit assumption about the level of its inputs; understanding that assumption is the first step to using the method correctly.
Level of measurement is not a philosophical footnote. It chooses the statistic (mean versus median versus mode), the chart (bar versus histogram), and the model (linear regression versus logistic versus ordinal). The same data, classified at different levels, will answer different questions.
6.2 Stevens’ Four Levels
Stevens’ 1946 classification arranges measurement into four nested levels. Each level admits the operations of the level below plus one new operation. Moving up the hierarchy adds mathematical structure and expands the set of permissible statistics.
| Level | Equality | Order | Equal intervals | True zero | Indian business example |
|---|---|---|---|---|---|
| Nominal | Yes | No | No | No | State code (TN, KA, MH) |
| Ordinal | Yes | Yes | No | No | CSAT rating (Low, Medium, High) |
| Interval | Yes | Yes | Yes | No | Temperature in °C at warehouse |
| Ratio | Yes | Yes | Yes | Yes | Monthly revenue in ₹ |
6.3 Nominal Scale
The nominal scale classifies observations into categories with no inherent order. The only valid operation is checking whether two values are equal.
State of residence (TN, KA, MH, DL), product category (electronics, apparel, grocery), payment method (UPI, credit card, net banking, cash-on-delivery), account status (active, dormant, closed), blood group, gender. The categories are mutually exclusive and collectively exhaustive; the numeric codes sometimes assigned to them (1, 2, 3) are arbitrary labels, not measurements.
Frequency counts, mode, chi-square tests of association, proportions. Reporting a mean over numeric codes is meaningless: the “average payment method” has no interpretation. In R, nominal variables are represented using factor() without the ordered argument.
6.4 Ordinal Scale
The ordinal scale adds order to the nominal scale. Values can be ranked, but the distance between consecutive ranks is not guaranteed to be equal.
Five-point Likert items (Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree), credit-risk bands (AAA, AA, A, BBB, …), NPS buckets (Detractor, Passive, Promoter), customer tiers (Bronze, Silver, Gold, Platinum), education level (undergraduate, postgraduate, doctorate). The gap between “Agree” and “Strongly Agree” is not necessarily the same as the gap between “Neutral” and “Agree”.
Median, mode, percentiles, rank-based tests (Mann-Whitney, Kruskal-Wallis, Spearman correlation). Means are technically not valid, although in practice a mean of Likert items is often reported and treated as approximately interval. In R, ordinal variables are represented using factor(..., ordered = TRUE).
6.5 Interval Scale
The interval scale adds equal spacing between adjacent values, so differences are meaningful. It lacks a true zero, which means ratios are not meaningful.
Temperature in Celsius or Fahrenheit, calendar year (2025 CE), IQ score, credit score on a bounded band (300 to 900 in the CIBIL range when treated as interval). The difference between 20°C and 25°C equals the difference between 30°C and 35°C, but 20°C is not “twice as hot” as 10°C because 0°C is a convention, not an absence of temperature.
Mean, standard deviation, Pearson correlation, t-tests, linear regression. Ratios and coefficients of variation are not valid because the zero point is arbitrary. In R, interval variables are stored as numeric.
6.6 Ratio Scale
The ratio scale adds a true zero that represents the absence of the quantity. All arithmetic operations are meaningful, including ratios.
Monthly revenue in ₹, units sold, web-session duration in seconds, customer age in years, number of app sessions per week, distance travelled, inventory count. If a store’s sales doubled from ₹5 lakh to ₹10 lakh, the ratio is meaningful because ₹0 denotes no sales.
All arithmetic, geometric mean, harmonic mean, coefficient of variation, log transforms, any parametric or non-parametric inferential test, any regression model. In R, ratio variables are stored as numeric (or integer when discrete). The distinction between interval and ratio seldom affects routine analytics, but matters for metrics like growth rates and elasticities that depend on ratios.
6.7 The Hierarchy of Measurement Levels
Each level includes the operations of the level below and adds one more. A variable measured at the ratio level can always be degraded to a lower level by discarding information (for example, binning revenue into “Low/Medium/High”), but the reverse is never possible.
Always collect at the highest level the measurement process supports and bin later if needed. A rating captured as an integer from 0 to 10 can be analysed as continuous, ordinal, or nominal; a rating captured only as “Good/Neutral/Bad” cannot be recovered.
6.8 Discrete and Continuous Variables
Discrete and continuous is an orthogonal distinction that cuts across Stevens’ scheme. Discrete variables take isolated values (typically integers); continuous variables can take any value within an interval.
Count of orders placed, number of defects per batch, number of employees in a branch, number of logins per day. Discrete variables are usually modelled with count-data distributions (Poisson, negative binomial).
Revenue, time, temperature, distance, weight. Continuous variables are usually modelled with continuous distributions (normal, log-normal, gamma). The distinction matters at the modelling stage because the wrong distributional assumption can produce biased confidence intervals and invalid tests.
6.9 Qualitative and Quantitative Variables
An older and coarser classification splits variables into qualitative (categorical) and quantitative (numerical). Qualitative variables correspond roughly to nominal and ordinal; quantitative variables correspond to interval and ratio. The split is convenient in everyday language but too blunt for analytics, where the distinction between ordinal and interval, or between interval and ratio, often decides which technique is appropriate.
6.10 Permissible Statistics by Level
| Level | Central tendency | Dispersion | Comparison | Example inferential test |
|---|---|---|---|---|
| Nominal | Mode | - | Frequency, proportion | Chi-square |
| Ordinal | Median, mode | Range, IQR | Rank | Mann-Whitney, Kruskal-Wallis |
| Interval | Mean, median, mode | SD, variance | Difference | t-test, ANOVA |
| Ratio | All of the above plus geometric and harmonic mean | SD, CV | Difference and ratio | All of the above plus log-linear models |
6.11 Representing Levels in R
The level of measurement should be encoded in the object type so that downstream functions treat it correctly.
Calling mean() on an unordered factor would throw an error; calling sum() on an ordered factor would also fail. The object type is itself a form of documentation that protects later analysis from invalid operations.
6.12 Common Mistakes
-
Averaging a single Likert item as if it were ratio. An “average satisfaction of 3.4” on a 1-5 scale treats the ordinal gaps as equal when they are not. Report the median, the distribution, or a validated composite score. 2. Computing ratios on interval data. Saying 30°C is “twice as hot” as 15°C is meaningless; rework the question on the Kelvin scale if a ratio is really needed. 3. Coding a category as an integer and fitting linear regression on it. Using 1 = Bronze, 2 = Silver, 3 = Gold, 4 = Platinum and treating the code as continuous imposes arbitrary equal spacing. Use dummy variables or ordinal regression. 4. Leaving ID columns as numeric. Customer ID, PIN code, and account number look numeric but are nominal. Convert them to
characterorfactorso they are never fed into arithmetic or into a regression as a predictor.
6.13 Implications for Modelling
The outcome’s level of measurement chooses the model. A binary nominal outcome calls for logistic regression. A multi-category nominal outcome calls for multinomial logistic regression. An ordinal outcome (satisfaction band, credit tier) calls for ordinal regression (proportional-odds or similar). A discrete count outcome (number of claims, number of logins) calls for Poisson or negative binomial regression. A continuous ratio outcome (revenue, duration) calls for linear regression, possibly after a log transform. Selecting the right family is not a stylistic choice; it affects every inference drawn from the model.
6.14 Summary
| Concept | Description |
|---|---|
| Levels of Measurement | |
| Nominal | Unordered categories; only equality is meaningful |
| Ordinal | Ordered categories; gaps not guaranteed equal |
| Interval | Equal spacing; differences meaningful; zero is arbitrary |
| Ratio | True zero; ratios and all arithmetic are meaningful |
| Key Operations | |
| Equality | Two values can be compared for sameness |
| Order | Values can be ranked from low to high |
| Equal intervals | Differences between adjacent values are equal |
| True zero | Zero denotes absence of the quantity |
| Variable Kinds | |
| Discrete | Isolated values, usually integer counts |
| Continuous | Any value within an interval; real-valued |
| Legacy Grouping | |
| Qualitative | Legacy label for nominal and ordinal |
| Quantitative | Legacy label for interval and ratio |
| R Representations | |
| factor() | R object for nominal variables |
| factor(ordered = TRUE) | R object for ordinal variables |
| numeric / integer | R object for interval and ratio variables |
The level of measurement is the single most consequential property of a variable at the point of analysis. Attaching the right level to each variable early (in the data dictionary, in the ingestion code, and in the R object type) prevents a large class of silent errors later.