How to tell whether a variable should be treated as continuous or categorical?

Question

Suppose I used a multiple linear regression to model the association between cognitive score (Y) and sleep quality (X) along with other variables (gender, age etc.).

cognitive score ~ sleep quality + gender + age + ...

Sleep quality was measured and recorded in scores:0, 1, 2 or 3. How could I tell whether this variable should be treated as continuous or categorical? Is there any statistical analysis I can use? I tried to simply plot the X and Y to see if there's a linearity. However, this plot can still be interfered by gender, age or other variables (which I cannot add in the plot).

The context should decide whether it is a continuous or discrete variable, not a statistical test. Can you tell us more about this variable, what do these numbers represent? — user2974951, Nov 05 '21 at 11:10
Though you are not saying the scale is "likert" rating, this may be helpful: https://stats.stackexchange.com/q/10/3277 — ttnphns, Nov 05 '21 at 11:19
About ordinal predictor variables https://stats.stackexchange.com/q/195246/3277 — ttnphns, Nov 05 '21 at 12:36
You can fit the model both ways, in categorical and continuous form, using all other X variables in both models. Then draw graphs of the response function, each way, on the same set of axes, holding all other X variables fixed at, say, their mean values. Drastic differences in the two functions suggest the need to treat the X as categorical; otherwise, use the continuous form for parsimony. There are examples with R code here https://www.routledge.com/Understanding-Regression-Analysis-A-Conditional-Distribution-Approach/Westfall-Arias/p/book/9780367458522# — BigBendRegion, Nov 07 '21 at 14:25
@user2974951 Thank you. This variable is a score used to quantify sleep quality (PSQI score). The score ranged from 0-3 based on self report question including time stayed in bed, daytime sleep etc. — Ian Wang, Nov 07 '21 at 15:35

score 0 · Answer 1 · answered Nov 05 '21 at 11:37

Your variable is neither.

You have what is called an ordinal variable, which is kind of a hybrid of a numerical variable and a categorical variable. Think of it as an ordered category where the differences between categories are hard to quantify. We know that $1$ meter is the same amount shorter than $2$ meters as $2$ meters is shorter than $3$ meters. I struggle to say something similar for rating sleep quality.

Sometimes it can be fine to treat an ordinal variable as continuous, such as when you calculate your school GPA. Other times, it might make less sense.

I have no problem in calling ordinal scales categorical too, as well as nominal scales. That's a strong pattern in texts on categorical data analysis such as those of Alan Agresti. — Nick Cox, Nov 05 '21 at 13:33

score 0 · Accepted Answer · edited Nov 05 '21 at 13:34

The only elegant solution is to use a Bayesian model where the predictor is coded with indicator variables that are constrained by prior distributions that respect the ordinal nature of the variable. The R brms package brm function handles ordered factor variables automatically in this way. If doing traditional frequentist regression models, the best we usually have time to do is to treat the ordinal variable as quadratic, i.e., include $x$ and $x^2$ in the model, the latter term so as to not assume linearity. We would use regression splines for continuous variables where knot location is not problematic due to excessive ties.

When an ordinal variable has less than, say, 4 levels, it is not too inefficient to treat it as categorical using the usual indicator variable approach.

How to tell whether a variable should be treated as continuous or categorical?

2 Answers2