1

I have a dependent variable (DV) and about 200 independent variables (IVs). I want to understand which of the 10-20 variables are important for this DV. I could do:

  1. PCA - However it'll only tell me which 10-20 variables contain majority of the info of the dataset irregardless of the DV
  2. Regression and check p-value - However, I don't know which curve the data follows, so I can't do simple linear regression

Any ideas would be highly appreciated!

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
huhahihi
  • 11
  • 1
  • 2
    HI huhahihi and welcome to the site! What you describe is a very common problem in statistics and there are numerous posts on this site that address it. For example, have a look at [this post](http://stats.stackexchange.com/questions/4272/when-to-use-regularization-methods-for-regression/4274#4274) regarding regularization methods such as lasso or ridge regression. – COOLSerdash Jun 16 '15 at 11:53

0 Answers0