2

Let's say I have the typical ANOVA where X has 3 categories and I one hot encode. I then have: $$Y = b_1X_1 + b_2X_2 + b_3X_3\text{ ,}$$

where $b_1, b_2, \text{ and } b_3$ are basically how much mean$(Y | X)$ deviates from 0 (and if the deviation is significant or not)

Now let's say I build a LASSO and it happens so that one of the coefficients go to 0 so my resulting model is: $$Y = b_1X_1 + b_2X_2\text{ .}$$

How do I interpret $b_1$ and $b_2$? Like how do I relate that to means?

Thanks!

Carl
  • 11,532
  • 7
  • 45
  • 102
confused
  • 2,453
  • 6
  • 26
  • https://stats.stackexchange.com/questions/209009/how-to-treat-categorical-predictors-in-lasso – Maxtron Sep 25 '21 at 00:44
  • @Maxtron I like that link and agree that the OP should read it, though I do not believe this is a duplicate. – Dave Sep 25 '21 at 00:48

1 Answers1

6

I would say that you interpret the coefficients the exact same way (and this does not just apply to the dummy variables). After all, the LASSO estimator is just another way to guess what the population values are, like how the following are all acceptable (defendable) ways to guess the population variance, even though only the second is unbiased.

$$ \dfrac{\sum_{i=1}^n(X_i-\bar X)^2}{n} $$

$$ \dfrac{\sum_{i=1}^n (X_i-\bar X)^2}{n-1} $$

$$ \dfrac{\sum_{i=1}^n(X_i-\bar X)^2}{n+1} $$

Someone could make an argument for any of these. Assuming a Gaussian distribution, the first is a maximum likelihood estimator, and statisticians often like maximum likelihood estimators. The second is unbiased (not just for Gaussians, either), so that’s nice. The third one has the lowest mean squared error of the three, so there is a sense in which it tends to be closer to the true value, even if it is biased.

By doing LASSO regression, you are, by analogy, picking that third variance estimator. You’re still doing it to guess what the regression coefficients are, and the population coefficients (that you’re estimating) have the same interpretation, no matter how you estimate them (or if you estimate them at all).

Dave
  • 28,473
  • 4
  • 52
  • 104