2

I want to conduct a multiple regression analysis on Health Survey Data and have subsetted the one dependent and seven independent variables into a new dataframe.

Whilst I now understand how to carry out the following, I am confused as to what order I should do them in, as I fear doing one before another may have implications on my interpretation which I am not aware of:

(I have listed them in the order I believe they should be done)

  • Hot Deck Imputation for missing values
  • Create Dummy Variables for Categorical Variables with more than two levels
  • Check for Non-Linearity between independent variables and dependent variable
  • Transformations to variables which show non-linearity
  • Check for normal distribution of independent variables
  • Check for and add interaction terms to model
  • Very closely related: https://stats.stackexchange.com/questions/32600/in-what-order-should-you-do-linear-regression-diagnostics/32625#32625. It is rare to need to check for normality of independent variables or even their residuals; for many (if not most) purposes it suffices to check for outlying residuals and no consequential skewness to their distribution. – whuber Dec 26 '17 at 21:01
  • 1
    Linear regression does not assume that either dependent, or independent variables are normally distributed. Rather, it assumes that the residuals are normally distributed. Also: hasn't hot deck imputation fallen out of favor compared to multiple imputation? – Alexis Dec 26 '17 at 22:55

0 Answers0