1

I found a few similar questions, but none quite had the answer I was looking for.

I aim to use logistic regression and have two IVs ("Independent Variables") that each have three categories. I have only seen IVs in ordinal logistic regression that have been continuous or binary categorical previously. I also cannot convert these IVs to a continuous format, as this will change the interpretability of the variables.

Am I able to include these IVs in their current state, or should I be looking to use another algorithm here? (E.g. decision tree?)

Thanks.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
SRobertson
  • 23
  • 1
  • 5
  • 1
    If an IV is nominal, you should code it as a set of contrast variables (see Nick Stauner's answer). If an IV is ordinal (and you don't want to think back and treat it as nominal or interval) then, well, you might a) rank its values first, b) use optimal scaling categorical regression, c) use decision tree. See a related topic on correlating [nominal and ordinal variables](http://stats.stackexchange.com/a/73118/3277). – ttnphns Jan 07 '14 at 07:31

3 Answers3

3

Yes, you can include categorical variables with multiple levels in logistic regression.

Whether trees are a good alternative is another question.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
2

Here's an answer from a different forum about how you might use coding to handle your polytomous variables in regression in general; the original question was about logistic regression here too, so this corroborates other answers here that logistic vs. linear regression isn't an important distinction in the use of coding for polytomous variables. Here's another answer on Cross Validated from @StasK that answers a question just like yours, again suggesting coding.

However, in @GaetanLion's answer to one of those similar questions, some discussion of a drawback of coding appears (mostly interpretive complexity, I think) to emphasize that coding may not be necessary depending on your statistical software. On the other hand, judging from @gung's answer to another very similar question about the interpretive complexity of an analysis like yours, some software may code automatically, and different tests are necessary for estimating the significance of the polytomous factor as a whole (as opposed to particular levels).

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105
1

Categorical IVs with more than two groups are perfectly fine for a logistic regression, just like multiple regression. If the dependent variable is more than two groups, however, then a multinomial logistic regression is appropriate. The UCLA website has a great tutorial for ordinal logistic regression in R. The example used only has a categorical predictor with 2 levels, but the same logic applies (much like in a multiple regression).

Nick Cox
  • 48,377
  • 8
  • 110
  • 156
dmartin
  • 3,010
  • 3
  • 22
  • 27
  • Great @dmartin. I only have the two groups in my dependent variable, so ordinal logistic regression should be appropriate. – SRobertson Jan 07 '14 at 00:12
  • 1
    If you have two groups in your dependent variable, that's logistic regression plain and simple. – Nick Cox Jan 07 '14 at 00:17