Machine Learning Classification with Variables Summing to One

Asked Jul 24 '17 at 20:08

Active Jan 29 '18 at 10:41

Viewed 34 times

Suppose you have data with a bunch of predictors, and some of these predictors are proportions that add up to one. An example would be data like the following

gender  perc_shop   perc_game perc_stud   age
M       .23         .71       .06         31
F       .47         0         .53         19
F       .05         .31       .64         29

The variables in columns 2-4 all add up to one, so in logistic regression it would be necessary to remove one as a baseline variable. However, in building a classification model using machine learning methods (i.e. decision trees, random forests, svm, etc.) would it be necessary to remove one of the variables?

edited Jan 29 '18 at 10:41

kjetil b halvorsen

63,378
26
142
467

asked Jul 24 '17 at 20:08

Megan

2

Possible duplicate of [Why is multicollinearity not checked in modern statistics/machine learning](https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not-checked-in-modern-statistics-machine-learning) – Sycorax Jul 24 '17 at 21:52

Machine Learning Classification with Variables Summing to One

0 Answers0