2

Suppose you have data with a bunch of predictors, and some of these predictors are proportions that add up to one. An example would be data like the following

gender  perc_shop   perc_game perc_stud   age
M       .23         .71       .06         31
F       .47         0         .53         19
F       .05         .31       .64         29

The variables in columns 2-4 all add up to one, so in logistic regression it would be necessary to remove one as a baseline variable. However, in building a classification model using machine learning methods (i.e. decision trees, random forests, svm, etc.) would it be necessary to remove one of the variables?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Megan
  • 21
  • 1
  • 2
    Possible duplicate of [Why is multicollinearity not checked in modern statistics/machine learning](https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not-checked-in-modern-statistics-machine-learning) – Sycorax Jul 24 '17 at 21:52

0 Answers0