0

When we learn about regularization techniques like LASSO, Ridge, etc... it is often taught alongside regression. When performing LASSO or other penalty based regularization techniques, do the same assumptions for a linear regression need to hold before we do LASSO? I.e. Y is linear in coefficients, Y | X is normally distributed, iid, etc...

Or can we run LASSO on the raw data and once we have the reduced feature set, then transform the features so that they fit the assumptions of a linear regression.

I would imagine, for large predictor sets, it would be tedious to check each feature before hand. But at the same time, if the features are NOT pre-processed to fit the assumptions of a linear model, the coefficient estimates wouldn't be accurate - and thus I don't know how accurate our LASSO results would be. Thanks!

confused
  • 2,453
  • 6
  • 26

2 Answers2

1

Ridge regression is defined in terms of applying $L_2$ penalty to the weights of the model

$$ \operatorname{arg\,min}_\boldsymbol{\beta} \| y - \mathbf{X}\boldsymbol{\beta} \|^2_2 + \lambda\| \boldsymbol{\beta} \|^2_2 $$

LASSO uses $L_1$ penalty instead, dropout is applied to neural network weights as well, etc. Regularization is about making models less complex, not the data. Weights are part of the model and make sense only together with other weights, for example, you couldn't "copy and paste" weights between different regression models. So the fact that regularization has zeroed-out some weights for one model, doesn't need to mean that those features would not be important for other model.

On another hand, there are model agnostic algorithms that can be applied to the data to select the "best" set of features that can be used with different algorithms. The purpose of such algorithms it to find such "best" features, but different algorithms may select different features and you never have guarantee that the selected features are "ultimately best".

Tim
  • 108,699
  • 20
  • 212
  • 390
1

Yes, the assumptions apply. But why are you transforming variables in order to make the model fit the data? Instead of changing the data, change the model.

Quantile regression and robust regression do not make assumptions about the distribution of residuals.

On the other hand, if you have nonlinear relationships, you will have to model those - whether you use LASSO or ridge and whatever kind of regression model you choose. But you should want to model them -- they could be very important and interesting.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • Well in class we've spent some time getting the data to fit the linear regression (transformations like log,, sqrt, polynomial, etc...) so I figured that's what people do in practice. And then when it comes to time series, differencing and other transformations so that it can fit ARMA model assumptions. I guess I got the wrong impression. – confused Jun 24 '20 at 23:40
  • People do this. But they shouldn't. It's a holdover from when OLS was the only practicable method. But time series are a different set of methods. Differencing there is needed, as far as I know, to achieve stationarity (but time series are not something I know about). – Peter Flom Jun 25 '20 at 11:05