Running a logistic regression we get p-values for all the input variables which helps us choose significant inputs. Similarly can we use the classification trees to pick variables that are split, and use those variables in the model? I think the fact that splitting the dataset on a variable leads to lower classification error should be a good indicator of the predictive power of the variable, is that true?
Asked
Active
Viewed 272 times
0
-
1Some smart-aleck is going to answer "Yes" (which, incidentally, I believe is the right answer). Is there more that you are really intending to ask? – rolando2 Apr 24 '12 at 02:09
1 Answers
3
It is generally not valid to "choose significant inputs", a form of highly problematic stepwise regression. Using classification trees to form inputs for logistic regression is even more problematic because of simultaneously increasing both type I and type II error.

Frank Harrell
- 74,029
- 5
- 148
- 322
-
+1 Maybe it would help the OP if you could explain why and how "choosing significant inputs " leads to invalid conclusions and how type I and type II risk get increased, for what test and how "simultaneously" so. – Momo Jun 23 '12 at 17:20
-
also +1, @Momo, I agree w/ your suggestions here. For the interim, I provided a conceptual overview of why using 'significance' leads to problems [here](http://stats.stackexchange.com/questions/20836/algorithms-for-automatic-model-selection/20856#20856). – gung - Reinstate Monica Jun 23 '12 at 17:32
-
2Choosing "significant inputs" is [double dipping](http://www.nature.com/neuro/journal/v12/n5/abs/nn.2303.html). It systematically biases the model in favor of extreme predictions and destroys type I error. Tree methods increase type II error because of their ineffective way of handling continuous predictors and additivity. The link from @gung is excellent. To address the greatly increased type I error you would need very complex adjustment, but this wouldn't help with the type II error or with the generally poor future predictive performance of such approaches. – Frank Harrell Jun 23 '12 at 17:54
-
@FrankHarrell, thanks for the complement, that's very kind of you. I didn't know that about CART, but it makes sense; I finally purchased a copy of your book (I got tired of the fact that it's always checked out at the library) so maybe I'll understand these things better soon. – gung - Reinstate Monica Jun 23 '12 at 18:15