0

I've seen several questions here about the assumptions of lasso regression, but it's not really what I'm looking for, so here it goes.

I'm using the LASSO technique for multiple linear regression, specifically with glmnet() and the "Hitters" data set in R. What I would like to do is confirm that the model I am getting out is still adhering to the necessary assumptions of linear modeling.

I saw another post on another site indicating that I should create an lm() object using the coefficients that "survive" the LASSO, but I'm not sure that's right.

So: How can I check the assumptions of linear regression when using the LASSO technique?

  • A OLS linear model with the “surviving” LASSO coefficients will not result in the same coefficient estimates as the LASSO regression. I explain a bit [here](https://stats.stackexchange.com/a/555163/247274), but it’s not hard to find an example when you do the calculations in software; just try it and voilà! – Dave Dec 07 '21 at 22:10
  • @Dave So how can I evaluate the assumption of linear regression in this context? – Jairaj Ranchod Dec 07 '21 at 22:24
  • What assumptions do you want to evaluate? How would you evaluate them in an OLS linear model? – Dave Dec 07 '21 at 22:39
  • I want to check the distribution of the residuals specifically. In OLS, I would create a plot of the final model. I actually just now generated a prediction vector and compared it against the actual outcome. My problem now is that the residuals "fan out". Would a log transformation of the outcome be appropriate there? – Jairaj Ranchod Dec 08 '21 at 00:56
  • Without knowing the details, it’s hard to say. However, what would you do for an OLS regression? Remember that LASSO is just another method for estimating the parameter vector $\beta$ in $\mathbb E[Y\vert X]=X\beta$. – Dave Dec 08 '21 at 01:10
  • It depends on the objective. I understand LASSO to be more widely used for prediction accuracy than inference. Since the "fan out" is symmetrical, I'm not compromising prediction accuracy according to this: https://academic.macewan.ca/burok/Stat378/notes/remedies.pdf So I would leave it alone. On the other hand, if analysis of the confidence intervals was a priority, I would sacrifice the interpretability of the coefficients and make the transformation. – Jairaj Ranchod Dec 08 '21 at 01:23
  • Maybe "I'm not compromising prediction accuracy" isn't quite right; the transformation does appear to make the model fit better, – Jairaj Ranchod Dec 08 '21 at 01:26

0 Answers0