In stepwise regression, how to interpret non-significant variables?

Question

I have more than 15 IVs such as age, gender, education, first language, technology proficiency, health condition, etc, and one of my DVs is health literacy level, which is measured through a standard questionnaire.

I'm using multiple regression to see which IVs might predict my DV. Since I don't have a specific assumption, I chose stepwise regression (forward selection) to find the best model.

I got a model with the lowest AIC. The model is significant (p <.001, R2 = .25) and consists of health condition, first language, technology proficiency, and age. But among the four variables, only the health condition is significant (p = .005), first language, technology proficiency are in the borderline (p =.07 ~ .08), and age is not significant at all. So I'm wondering how I should interpret the two borderline variables and the non-significant variable in this case?

I'm asking this question not only because the two variables are in the borderline, but also they contribute to the best model (i.e., the models without them has higher AIC).

Maybe the way I understand stepwise regression is not correctly, so I should pick another type of analysis. Or Maybe I should run some hierarchical regression (e.g., incremental F test) to see if there are more layers of this relationship. Do anyone have any ideas?

Thanks in advance!

Note that removing variables always reduces $R^2$, whether or not they're relevant. — gung - Reinstate Monica, Aug 21 '19 at 20:30
Since you shouldn't be using stepwise, the question is still a duplicate. — Peter Flom, Aug 22 '19 at 11:46

score 4 · Accepted Answer · answered Aug 20 '19 at 23:33

4

If you are interested in interpreting coefficients and significance/p-values, don't use stepwise regression. See this post. In fact, stepwise regression is basically always a bad idea in this day and age when better model-selection techniques are easily computable. If you want to easily interpret the model and say which coefficients are truly "significant," come up with a model that you find scientifically reasonable, fit that model, maybe tweak the model a little bit based on diagnostics, adjust the p-values of the coefficients you're interested in for multiple testing, and there you go. If you want to find out which variables are important, try LASSO. You can get significance out of a LASSO model, but it ain't easy.

answered Aug 20 '19 at 23:33

Sheridan Grant

821
4
13

Thanks for the answer! I will take a look at Lasso! – randomcat Aug 21 '19 at 16:13
so I tried Lasso and it included more factors, but when I calculate the R2, it was negative. So I assume Lasso regression doesn't fit my data. But I'm wondering, how come the R2 is positive with a simple multiple regression but is negative for Lasso? – randomcat Aug 22 '19 at 00:26
R^2 shouldn't ever be negative, it's a quadratic! Maybe update your question with the model you ran and the results? It's extremely surprising that a stepwise regression-fit model would fit the data well but LASSO wouldn't, so something must be amiss. – Sheridan Grant Aug 22 '19 at 18:49

In stepwise regression, how to interpret non-significant variables?

1 Answers1