I am running a lasso model to predict a continuous variable, and I have continuous and categorical inputs.
- In terms of centering and scaling - is it correct that this should only be applied to continuous variables?
- Also, in terms of the dummy variables step, can this be ignored for the categorical? It just doesn't make sense when it comes to scoring.
- For example, if the training set has a category called Cats, that has levels A-Z, this would create a column for A-Z. But if my scoring set is missing one of these levels then it will fall over. Looking for some guidance.