my datasets have 500 variables, how to quickly verify which independent variables are significant to my dependent variable or my model? what I usually do is to import some of them, and see which one has a small p-value.
Asked
Active
Viewed 99 times
0
-
2Try to run a lasso regression with increasing weight on the penalty terms. It performs well in variable selection task – Yang Song Mar 15 '18 at 20:59
-
4What question are you trying to answer? What do you mean when you ask if they are significant? More context is needed; what makes sense in one circumstance may not in another. – Aaron left Stack Overflow Mar 15 '18 at 21:29
-
Possible duplicate of [Variable selection for predictive modeling really needed in 2016?](https://stats.stackexchange.com/questions/215154/variable-selection-for-predictive-modeling-really-needed-in-2016) – kjetil b halvorsen Mar 15 '18 at 21:37
1 Answers
4
It's hard to answer this without more information about your data and question, but conducting univariate tests and evaluating the p values ignores more complex intercorrelations and multivariate interactions that may be present. Using regularization during cross validation, as a commenter noted, is a more principled way to go about feature selection.

HEITZ
- 1,682
- 7
- 15