5

For categorical variables, one hot encoding is a must if the variable is non-binary . But what about ordinals? These variables are ordered but are mutually exclusive. Do they require the same treatment as categoricals other than labelling?

Shiv_90
  • 201
  • 3
  • 11

1 Answers1

6

The proper treatment of ordinal independent data in regression is tricky.

The two most common approaches are:

  1. Treat it as continuous (but this ignores the fact that the differences in levels may not be similar).

  2. Treat it as categorical (but this ignores the ordered nature of the variable).

The first method would not require one-hot encoding. The second would.

Some new methods have been developed. One that I have sometimes found useful is optimal scaling.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • In my case there are two variables that are I believe ordinal. and that are `dew_point` and `visibility_in_miles` and both of them affect `traffic_volume` which is the target. I have been able to address categoricals as they should be but I'm stuck with these ordinal variables. – Shiv_90 Aug 27 '19 at 11:53
  • 2
    Why would those be ordinal? They are both quantities. You should be able to have them as continuous variables. – Peter Flom Aug 27 '19 at 12:19
  • Your point is right. But in my dataset `dew_point` and `visibility_in_miles` range from 0 to 9 with discrete values so they are not continuous in this case. And since there only 9 levels for both the variables, I believe they should be treated as ordinals. – Shiv_90 Aug 28 '19 at 08:48
  • 1
    That's an interesting case. If the 9 levels are coded in such a way that they can be made continuous, then that might be better. 9 levels is a lot. – Peter Flom Aug 28 '19 at 12:31