Interpretation of p-values in multiple regression output

Question

I have data from a survey where I collected demographic information and quiz scores from college students. I ran backward model selection to determine which of my variables affect knowledge. My best model includes the variables Gender, Religion, Political party, School type * Hometown region, Curriculum, and Class size. Now I'm interested in determining if there are significant differences between the categories of these variables (e.g., Democrats vs. Republicans, Democrats v. 3rd Party, etc.) Looking at the model output,

                                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)                             1.12742    0.11051  10.202  < 2e-16 ***
GenderMale                              0.27761    0.03665   7.574 3.62e-14 ***
ReligionNone                            0.38754    0.04057   9.553  < 2e-16 ***
ReligionPolytheist                      0.03883    0.08426   0.461 0.644918    
PoliticalOther                         -0.05100    0.04064  -1.255 0.209471    
PoliticalRepublican                    -0.26154    0.04931  -5.304 1.13e-07 ***
School_TypePublic                       0.31366    0.11628   2.697 0.006987 ** 
Home_RegionNortheast                    0.42175    0.15982   2.639 0.008318 ** 
Home_RegionSouth                        0.25988    0.12583   2.065 0.038897 *  
Home_RegionWest                         0.47569    0.14199   3.350 0.000808 ***
CurriculumMajor                         0.14399    0.04894   2.942 0.003257 ** 
CurriculumNone                         -0.15485    0.04176  -3.708 0.000209 ***
Class_SizeAbove 400                     0.18364    0.04575   4.014 5.97e-05 ***
Class_SizeBetween 201 and 400           0.10113    0.04470   2.263 0.023664 *  
School_TypePublic:Home_RegionNortheast -0.38518    0.17365  -2.218 0.026545 *  
School_TypePublic:Home_RegionSouth     -0.42587    0.13869  -3.071 0.002136 ** 
School_TypePublic:Home_RegionWest      -0.46535    0.15436  -3.015 0.002573 **

I understand that the Intercept/reference level is GenderFemale, ReligionMonotheist, PoliticalDemocrat, etc. My question is this: since the reference level contains multiple variables, can I accurately use the provided p-values to determine if there's a difference between/among genders, religions, political parties, school types, hometown regions, curriculums, and class sizes? That is, can I say that there is a significant difference between males and females (P < 0.005), or a non-significant difference between monotheists and polytheists (P = 0.645)?

If so, can I use the relevel function to change the reference level so I can check the significance of all combinations of a variable (e.g., ReligionPolytheist vs. ReligionNone)?

I may be overthinking this, but if someone could clarify this output interpretation for me it would be greatly appreciated.

Thank you,

Sara

Frans Rodenburg · Accepted Answer · 2019-05-08T01:15:22.130

You could indeed use the relevel function to place different categories in the intercept. For any variables that don't interact, you can safely make conclusions about their main effects, even though there are multiple variables' categories in the intercept. Because of the interaction School_Type * Home_Region though, there is no such thing anymore as a school type specific difference. By specifying a model where school type interacts with home region, you imply that the effect of one depends on the value of the other. You can therefore only say what the effect of school type is, given the home region, and visa versa.

However, there is a more subtle problem with your approach: By selecting the best fit to your data based on $p$-values, any $p$-values you obtain from that model for inference are biased towards significance. I instead recommend using the original model to make any claims of significance. Backwards selection is a form of step-wise regression. Its problems have been addressed well here, here and here.

A final consideration is the large number of comparisons you are suggesting. If you perform more than one significance test, the overall chance of (at least one) false positive will be much larger than the chosen level of significance. You should consider a kind of multiple testing correction, such as Bonferroni or the Benjamini–Hochberg procedure. In R there is a simple function p.adjust for this.

Thanks Frans! That is a good point regarding the bias in p-values from my reduced model. To clarify, you suggest using the method described above on the full model (i.e., with all variables, not just significant ones)? On your last point, that was indeed a concern my collaborator and I had. Initially I ran pairwise Wilcoxon rank sum tests for each significant variable, but our main concern was increased chance of Type I error. I will look into the types of corrections and the p.adjust function. — birdnerd, May 08 '19 at 02:22
You're welcome! That is indeed what I meant. For future reference, if you have a survey question about class size, it might be preferable to simply model it as a number rather than discretized categories. This will save you some parameters in your model, and in turn save you some power. — Frans Rodenburg, May 08 '19 at 02:25

score 0 · Answer 2 · answered May 08 '19 at 01:45

It is not incorrect to re-estimate the model with different reference categories in order to test the significance of group differences, but it's unnecessarily convoluted.

It may be more elegant to estimate the model once and use a Wald tests on whether the coefficients are equal. This is easy in R.

First, fit your model:

model <- lm(knowledge ~ PoliticalOther + PoliticalRepublican, data = surveyData)

Then use the linearHypothesis function from the library car:

linearHypothesis(model, "PoliticalOther = PoliticalRepublican")

Interpretation of p-values in multiple regression output

2 Answers2