1

I am running a lasso model to predict a continuous variable, and I have continuous and categorical inputs.

  • In terms of centering and scaling - is it correct that this should only be applied to continuous variables?
  • Also, in terms of the dummy variables step, can this be ignored for the categorical? It just doesn't make sense when it comes to scoring.
    • For example, if the training set has a category called Cats, that has levels A-Z, this would create a column for A-Z. But if my scoring set is missing one of these levels then it will fall over. Looking for some guidance.
Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • I mean an independent predictor variable, the target variable is continuous. – Mathew Lionnet Jun 25 '21 at 01:46
  • Does this answer your question? [Ridge\Lasso -- Standardization of dummy indicators](https://stats.stackexchange.com/questions/359015/ridge-lasso-standardization-of-dummy-indicators) That covers the centering/scaling. [Group lasso](https://en.wikipedia.org/wiki/Lasso_(statistics)#Group_lasso) provides a way to handle all levels of a multi-level categorical predictor, although details might depend on your particular data set and purpose for modeling. – EdM Jun 25 '21 at 02:29
  • Yes, that kind of helps, thanks! – Mathew Lionnet Jun 25 '21 at 03:10

0 Answers0