Is it possible to conduct a regression if all dependent and independent variables are categorical variables?
Asked
Active
Viewed 5.5k times
24
-
3It's certainly possible, even for common or garden regression, so long as the response (dependent) variable is be treated purely numerically. Depending on your software, you may need to push or force that to happen. With a suitably wide definition of regression, to include logistic or ordinal regression, it's not only possible, it's commonplace. – Nick Cox Jul 28 '13 at 14:17
-
1If both dependent variable and independent variables are categorical, perhaps three most popular ways to go are: (i) nominal (binary or multinomial logistic) regression; (ii) categorical regression (an optimal scaling procedure); (iii) some Classification/regression tree, such as CHAID. And, of course, there is also logit loglinear analysis, a regression-like procedure similar to the (i). – ttnphns Jun 20 '21 at 15:09
1 Answers
33
We need to be clear on our terms here, but in general, yes:
- If your dependent variable is continuous (and the residuals are normally distributed—see here), but all of your independent variables are categorical, this is just an ANOVA.
- If your dependent variable is categorical and your independent variables are continuous, this would be logistic regression (possibly binary, ordinal, or multinomial, depending).
- If both your dependent variable and your independent variables are categorical variables, you can still use logistic regression—it's kind of the ANOVA-ish version of LR.
Note that both logistic regression and ordinary least squares (linear) regression are special cases of the Generalized Linear Model.

gung - Reinstate Monica
- 132,789
- 81
- 357
- 650
-
It is the third case that you have mentioned, i tried LR, none of the coefficients found to be significant. I thought i might be doing something wrong. – altruist Jul 28 '13 at 14:20
-
2I don't think ANOVA _requires_ a continuous dependent variable any more than it _requires_ normally distributed residuals. These are just conditions under which ANOVA is expected to work well. – Nick Cox Jul 28 '13 at 14:20
-
1@NickCox, you're right, of course; we're quibbling over how we define & apply these terms. The way I would put it is that the model is derived from those assumptions, but the ANOVA can be used even if they aren't met, w/ the question of whether the results will be helpful depending. – gung - Reinstate Monica Jul 28 '13 at 14:25
-
1@altruist, I laid out the three cases for the sake of conceptual clarity; I recognize that the last is what you want. Note that whether or not you're using the software correctly to fit the model & whether or not your coefficients are 'significant' is unrelated to whether or not LR is the appropriate model for your situation. – gung - Reinstate Monica Jul 28 '13 at 14:27
-
5Note that being categorical is sometimes a matter of definition for the software, and sometimes in the mind of the beholder. What is number of children, for example? – Nick Cox Jul 28 '13 at 14:31
-
There are many reasons why there might be no significant coefficients. – Peter Flom Jul 28 '13 at 15:33
-
I have trouble believing that number of children is categorical. A family with two kids has one more kid than a family with one, and a family with six kids has three times as many as a family with two. – Dave Jun 20 '21 at 15:17
-
@Dave Surely, but that doesn't always drive the analysis. Number of children might be treated as if a measured predictor or as a categorical predictor; even the ordering ($1 < 2 < 3 < \cdots$) might have no bearing on how a model is parameterised or estimated. – Nick Cox Jul 16 '21 at 22:21
-
Hi @gung, I'm struggling to get an answer to a similar question. I would like to derive the least squares estimates of a simple linear regression with a three level categorical variable. Can you please provide me with any guidance? Here is the link to my question: stats.stackexchange.com/questions/546082/… – Blg Khalil Sep 28 '21 at 05:01