0

I am doing clogit model (clogit of survival package) with around 150 independent variables which are highly correlated. I have to select the combinations of the variables so that the model will be the best. How to select the best combination of variables?

Can I use PCA for that?

amoeba
  • 93,463
  • 28
  • 275
  • 317
Vish
  • 13
  • 6
  • See this http://stats.stackexchange.com/questions/27300 for a general discussion of using PCA for feature selection. – amoeba Mar 22 '16 at 20:48

1 Answers1

1

The paper Regularization Paths for Conditional Logistic Regression: The clogitL1 Package by Reid and Tibshirani gives a lasso solution for conditional logit.

Instead of maximizing the conditional logistic likelihood, they maximize the likelihood minus an L1 penalty (or lasso penalty). The penalty is equal to a tuning parameter $\lambda$ times the L1 norm, $$\lambda \sum_{j=1}^p |\beta_j|.$$

This penalty encourages some of the coefficients to be equal to 0, which tells you can remove them from your model. Larger values of $\lambda$ remove more variables from the model.

They also implemented an R package called clogitL1.

Andrew
  • 373
  • 2
  • 4
  • +1, but allow me a couple of suggestions (that apply to all answers here). First, when citing a paper use full title, author name and year instead of "this papers". This is good for two reasons: (i) links can rot and (ii) your answer can be found via searching for the authors or title. Second, the person who asked this question seems to be not very experienced, so a brief explanation of what "lasso" is might be helpful. Third, longer answers are much appreciated here; so some summary of what is going on in this paper would improve your answer greatly. – amoeba Mar 22 '16 at 20:46