I have run a few tests/methods on my data and am getting contradictory results.
I have a linear model saying: reg1 = lm(weight = height + age + gender (categorical) + several other variables).
If I model each term linearly i.e. no squared or interaction term, and run vif(reg1), 4 variables are >15. If I delete the variable with the highest vif number and re-run it the gifs change and now only 2 variables are >15. I repeat this until I'm left with 20 variables (out of 30) below 10. If I use stepwise directly on reg1 then it does not delete the 'highest vic' factor. I don't understand how it tells me 'what' is linearly dependant on 'what variable' and how (and I cannot seem to find this information despite googling for ages).
Furthermore, when I look at the residual plots, most appear horizontal except a few which are upside down u curved (none of these have high vifs). Does this means a transformation is needed? (I removed outliers, leverage points etc - but now there seem to be more!)
reg2 = lm(weight = (height + age + gender (categorical) + several other variables)^2).
If I run vif on this all of the terms are >500!
What else I have tried (without cutting any variables): (1) The errors seem correlated when i run diagnostics and check with Durbin Waston statistics indicating the model is not linear... however... (2) Box Cox gives lambda = 1 so no transformation is needed. (3) LASSO gives the lowest mallows cp on the full 30 variable model (i.e. least squares) (4) Ridge regression gives lambda = 0 which did surprise me.
I'm getting really confused about this data. To determine a suitable model for weight should I be looking just at linear terms or linear and interaction terms (remember there are 25 variables so there are 30^2 interaction terms)?
When I check which ones are significant in reg2 only 12 predictors and 6 interaction terms seem significant (AIC is lowest with this combination after I run step). Should I just use this 'new model with deleted variables/interaction terms' and do all my tests e.g. stepwise method, LASSO etc or do I do it on the entire model?
I'm getting quite lost in terms of making sense of steps to find a suitable model for weight using the variables.
My final question is once I have the model - how do i test/prove its the best/a decent model?
Any help would really be appreciated.