0

I'm trying to create a regression model with MLP to predict a continuous variable, that is the income of a movie. My set of regressors is composed by around 15 binary variables (I've used one-hot-encoding on some categorical variables) and two continuous variables, the length in minutes of the movie and the budget of production of the movie. My questions are: 1) how should I treat these two continuous regressors? Is it enough to standardize them, or should I create bins and then use one-hot-encoding? If I standardize them, should I also standardize the rest of the binary regressors? 2) After taking care of the regressors, what should I do with the dependent variable? Should I standardize it or leave it as it is?

Sorry if my doubts seem stupid, but I'm just getting started with predictions. Thank you

1 Answers1

0

1) Don't convert your continuous regressors into categoricals, so no binning for them. You might lose data, and not making back-prop's job easier by doing that. Standardization in some way should be ok since MLP activations will be quite sensitive to it.

2) You don't need to standardize binary regressors. It doesn't make much sense in many situations. For further discussion, resort here. But, you might choose to use different encodings, having zero-mean, i.e. $\pm1$

3) It might be good idea to use target standardization. For example, if you use regularization, the values that weights can take largely affect the loss function. If you're not using any kind of regularization, or a method having some prior effect on the weights, it doesn't matter much but also it doesn't hurt you.

gunes
  • 49,700
  • 3
  • 39
  • 75