I am doing multiple regression analysis, in which i want to eliminate some of the insignificant features. In most of the machine learning books subset selection, shrinkage methods or PCA is used for reducing number of feature. Why p-values are not commonly used for feature selection?
Asked
Active
Viewed 711 times
0

kjetil b halvorsen
- 63,378
- 26
- 142
- 467

Siddhesh
- 676
- 1
- 6
- 15
-
5[This is why.](http://stats.stackexchange.com/a/20856/1352) (Whether you do it in a stepwise manner or all at once doesn't change the fundamental problem.) – Stephan Kolassa Nov 19 '15 at 08:20
-
@Stephan : I read the answer. Does it imply p-values should never be used? – Siddhesh Nov 19 '15 at 12:40
-
2No. You can use and interpret p values if you use them correctly. [This is a good place to start understanding them](http://stats.stackexchange.com/questions/tagged/p-value?sort=votes&pageSize=50). In your specific case, if you look at multiple models (by selecting features), the p values will not be uniformly distributed under the null hypothesis any more, so you either need to find their new distribution (e.g., through simulation) or interpret them differently. – Stephan Kolassa Nov 19 '15 at 12:45