3

I am trying to create a prediction model with 33 predictors (brain metabolite levels in various regions) and 8 observations (cognitive test scores) with p>>n problem using LASSO in MATLAB (lassoglm function). When I run LASSO 100 times with 5-fold cross validation, I get multiple models, some with good predictive power and some with bad power. My questions are:

  • Can I create prediction model with 8 observations and 33 predictors?
  • Assuming the answer to my question is yes (or maybe), which model should I pick among 100 runs that will give me good predictive power? Can I pick the one with the minimum error? Do we have a overfitting problem here?
  • Once we select our model which method is good for validation of my model, $R^2$ or something else?
  • Can I average $R^2$ of top 10 models?
  • I also found predictors with good correlation with the response variable first and then run LASSO with these predictors only (correlations above 0.5). I obtained better models in some of my runs. Is this acceptable and are there any publications that support this idea?
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Cemil
  • 31
  • 1

0 Answers0