LASSO prediction model question

Asked Oct 22 '14 at 16:38

Active Oct 22 '14 at 16:52

Viewed 408 times

I am trying to create a prediction model with 33 predictors (brain metabolite levels in various regions) and 8 observations (cognitive test scores) with p>>n problem using LASSO in MATLAB (lassoglm function). When I run LASSO 100 times with 5-fold cross validation, I get multiple models, some with good predictive power and some with bad power. My questions are:

Can I create prediction model with 8 observations and 33 predictors?
Assuming the answer to my question is yes (or maybe), which model should I pick among 100 runs that will give me good predictive power? Can I pick the one with the minimum error? Do we have a overfitting problem here?
Once we select our model which method is good for validation of my model, $R^2$ or something else?
Can I average $R^2$ of top 10 models?
I also found predictors with good correlation with the response variable first and then run LASSO with these predictors only (correlations above 0.5). I obtained better models in some of my runs. Is this acceptable and are there any publications that support this idea?

edited Oct 22 '14 at 16:52

gung - Reinstate Monica

132,789
81
357
650

asked Oct 22 '14 at 16:38

Cemil

2

I'm not sure how much a 5-fold CV makes sense w/ 8 data. Why not 8-fold (leave one out) CV? Most likely, I think you can't do much w/ so few data. – gung - Reinstate Monica Oct 22 '14 at 16:54
What would an average $R^2$ of ten different models actually *mean*? – whuber Oct 22 '14 at 17:30

LASSO prediction model question

0 Answers0