Ordinal Feature Encoding (Linear or Nonlinear?)

Question

In most ordinal features, it seem that the scaling is linear. E.g. [1, 2, 3, 4] with higher score representing larger effect on the target variables

But is it possible to encode the feature in a nonlinear fashion? such as [1, 2, 4, 8]. What is the possible impact on the machine learning model, such as neural network and random forest.

score 3 · Accepted Answer · answered Sep 04 '20 at 08:14

Given that the nonlinear transform is invertible, it won't have a visible affect for the random forest because decision trees trained internally will find split points for these features. For example, if with original features the best split point (e.g. in terms of entropy or gini measure) is between 2-3, after the transformation it'll be between 2-4.

For neural networks, the scale of the feature changes. This might affect gradient descent performance if the scale becomes too high. For example, the transformation $100^x$, even after feature standardisation will create data points that seem like outliers.

Ordinal Feature Encoding (Linear or Nonlinear?)

1 Answers1