In classical statistical regression analysis (e.g. linear regression) one level of the categorical variable is usually not used to create a dummy variable to create a reference (e.g. there is only one column gender_male). I understand why.
I noticed that many machine learning models appear to still use all level whilst "one hot encoding". So gender with 2 levels results in 2 columns: gender_male and gender_female. This may lead to the curse of dimensionality having an impact. So not sure why ML folk do this.
Anyway, Can one still leave out the redundant level or is there a reason to use 2 columns using the simple example?
Please note that my specific model is an ANN (this). Thus, I am not using "statistical regression" (neither standard/lasso/ridge/elastic) and I am not interested in interpretability. Collinearity should also not be an issue AFIK.
PS:
I found a potentially other discussion on the context, which may help someone here.
PPS:
I am more inclined to use binary encoding for ANN now.