1

I have a dataset with 100 predictor variables (95 continuous, and 5 categorical) and 1 target variable (continuous). After plotting the density plots of the continuous predictor variables, they are all normally distributed.

My goal is to build a linear regression model, as a start, and use the 100 predictor variables to predict the target variable.

  1. How do I know if I need to "normalize" my data (the predictor variables that are continuous)?
  2. If I determine that I need to normalize my predictor variables, do I also need to normalize my target variable?
  3. How do I determine which method of normalization is appropriate? Is this a local (per variable) decision or global (one normalization approach for all variables)?

I am using R, if there are any packages that can help, please let me know.

I am not sure if I should make the decision to normalize values before or after the regression model is built. For example, I could forgo data normalization, build the model, and cross-validate it, and if I don't like the results, repeat the process by tinkering with data normalization. To me, this approach would seem like fudging the process until I get a reasonable result.

Jane Wayne
  • 1,268
  • 2
  • 14
  • 24
  • 2
    You have said that your continuous predictors are all normally distributed, so what's the problem here? There is no assumption in regression that any predictor is normally distributed; if that were so, then use of indicator variables would be quite out of court, rather than being recommended in every decent regression text. Setting that aside, what is your method for judging "is normally distributed"? Looks roughly symmetric on a density plot could be good enough, but we can't tell. I've never seen a situation in which all of many continuous predictors were normally distributed. – Nick Cox Oct 28 '15 at 07:31
  • There are many threads here on regression assumptions and on transformations to normality. I can't see here that you have a really new question. – Nick Cox Oct 28 '15 at 07:32
  • See e.g. http://stats.stackexchange.com/questions/16381/what-is-a-complete-list-of-the-usual-assumptions-for-linear-regression – Nick Cox Oct 28 '15 at 08:02

0 Answers0