I want to conduct a multiple regression analysis on Health Survey Data and have subsetted the one dependent and seven independent variables into a new dataframe.
Whilst I now understand how to carry out the following, I am confused as to what order I should do them in, as I fear doing one before another may have implications on my interpretation which I am not aware of:
(I have listed them in the order I believe they should be done)
- Hot Deck Imputation for missing values
- Create Dummy Variables for Categorical Variables with more than two levels
- Check for Non-Linearity between independent variables and dependent variable
- Transformations to variables which show non-linearity
- Check for normal distribution of independent variables
- Check for and add interaction terms to model