I have a dataset with 100 predictor variables (95 continuous, and 5 categorical) and 1 target variable (continuous). After plotting the density plots of the continuous predictor variables, they are all normally distributed.
My goal is to build a linear regression model, as a start, and use the 100 predictor variables to predict the target variable.
- How do I know if I need to "normalize" my data (the predictor variables that are continuous)?
- If I determine that I need to normalize my predictor variables, do I also need to normalize my target variable?
- How do I determine which method of normalization is appropriate? Is this a local (per variable) decision or global (one normalization approach for all variables)?
I am using R, if there are any packages that can help, please let me know.
I am not sure if I should make the decision to normalize values before or after the regression model is built. For example, I could forgo data normalization, build the model, and cross-validate it, and if I don't like the results, repeat the process by tinkering with data normalization. To me, this approach would seem like fudging the process until I get a reasonable result.