Do we really need to drop first in one-hot encoding?

Question

Is there any consensus over whether one needs to drop first when performing one-hot encoding.

With reference to here and here, I am thinking that only when you are using OLS then you need to leave one column out due to singularity issue, and thus, if you use regularized, or non-linear models, then there is no need to do this. I just want to clarify if there is any dummy trap here?

Dropping one of the columns from one-hot encoding is called dummy coding. You need it when using linear regression, GLMs, etc. You don’t need it for most of the ML algorithms, including regularized regression. You’ll find more details in the thread linked above. — Tim, Oct 16 '21 at 09:06
@Tim, I don't agree completely on terminology with you. All k dummies corresponding to a k-level factor are still "dummy variables". It is not the dropping of one of them from the set what makes them to be called "dummy". But that is minor odds. — ttnphns, Oct 16 '21 at 10:34
@ttnphns agree on “dummies”, but “dummy encoding” and “one-hot encoding” are rather settled names. — Tim, Oct 16 '21 at 11:17
@Tim, Why would "dummies" definition be different from "variables produced by dummy (aka Indicator aka one-hot) (en)coding"? Moreover: whether you input all the k or just k-1 of them as predictors, depend on the model and the algorithm. Example one: without intercept, input of all the dummias is all right. Example two: General linear model implemented via generalized inversion will internally encode a factor into k, not k-1, Indicators, which have some technical assets. — ttnphns, Oct 16 '21 at 11:32
And I don't agree with the notion that "dummy encoding = omitting one of the k Indicators" is a settled usage of terminology. It maybe is settled for some communities tied with some particular software, and not for other. — ttnphns, Oct 16 '21 at 11:32
@ttnphns I don’t find this distinction especially useful as well, but it seems to be common in CS/ML vs statistics communities. — Tim, Oct 16 '21 at 11:57

Do we really need to drop first in one-hot encoding?

0 Answers0