1

Following a comment on this thread, I have a question about interpreting a logistic regression model with significant coefficients, but non significant likelihood ratio test.

I have a super simple experimental design with one categorical predictor (with 5 conditions) and a binary outcome variable.

Edit: The hypothesis is that there will be a difference between conditions in whether the outcome will occur or not. The outcome represents presence/absence of a behaviour. However, we did not necessarily have directional hypotheses (we didn't know which conditions, exactly, would be different to the control, or if there would also be differences between the other conditions)

I have constructed a binomial logistic regression which looks like:

model_1

Call:
glm(formula = binary_outcome ~ condition, family = "binomial", 
    data = dat)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2181  -0.9005  -0.5553   1.1372   1.9728  

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)  
(Intercept)           0.09531    0.43693   0.218   0.8273  
conditioncondition_1 -0.38299    0.62077  -0.617   0.5373  
conditioncondition_2 -0.78846    0.63655  -1.239   0.2155  
conditioncondition_3 -1.01160    0.65134  -1.553   0.1204  
conditioncondition_4 -1.88707    0.76144  -2.478   0.0132 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 135.01  on 104  degrees of freedom
Residual deviance: 126.83  on 100  degrees of freedom
AIC: 136.83

Number of Fisher Scoring iterations: 4

This shows a significant difference when comparing condition 4 to the control condition. However, I understand that this formula relies on having a baseline comparison condition (i.e., the coefficients represent 4 of the conditions compared to the control condition). To get the individual coefficients for each variable I ran:

model_2

Call:
glm(formula = binary_outcome ~ condition - 1, family = "binomial", 
    data = dat)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2181  -0.9005  -0.5553   1.1372   1.9728  

Coefficients:
                           Estimate Std. Error z value Pr(>|z|)   
conditioncondition_control  0.09531    0.43693   0.218  0.82732   
conditioncondition_1       -0.28768    0.44096  -0.652  0.51414   
conditioncondition_2       -0.69315    0.46291  -1.497  0.13430   
conditioncondition_3       -0.91629    0.48305  -1.897  0.05784 . 
conditioncondition_4       -1.79176    0.62361  -2.873  0.00406 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 145.56  on 105  degrees of freedom
Residual deviance: 126.83  on 100  degrees of freedom
AIC: 136.83

Number of Fisher Scoring iterations: 4

Again, I can see a significant effect for condition 4. But, I guess this only shows within that condition, there is a difference in the outcome variable (i.e. equivalent to a one-sided t-test comparing the likelihood of the outcome occurring)?

However, if I test the overall model fit / log likelihood ratio, it is not significant. I did this in two different ways, first, by running a chi square test:

CrossTable(dat$condition, dat$binary_outcome, chisq = TRUE, digits = 2, sresid = TRUE, expected = TRUE, format = "SPSS", fisher = TRUE)

Pearson's Chi-squared test 
------------------------------------------------------------
Chi^2 =  7.777778     d.f. =  4     p =  0.1000661 

second, by constructing a null model, and then comparing the null model and model_1 (or model_2, I guess mathematically it is the same) like so:

model_null <- glm(binary_outcome ~ 1, family = "binomial", data = dat)
anova(model_null, model_1, test = "LRT") 
anova(model_null, model_1, test = "Rao")

Analysis of Deviance Table

Model 1: binary_outcome ~ 1
Model 2: binary_outcome ~ condition
  Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
1       104     135.01                       
2       100     126.83  4   8.1791  0.08523 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Analysis of Deviance Table

Model 1: binary_outcome ~ 1
Model 2: binary_outcome ~ condition
  Resid. Df Resid. Dev Df Deviance    Rao Pr(>Chi)
1       104     135.01                            
2       100     126.83  4   8.1791 7.7778   0.1001

If I change the anova test to "Rao" instead, following the discussion here, I get the same values as the chi square test, otherwise, they are slightly different. However in both cases, the test is non-significant (although the LRT test could be considered marginal, p < .09).

If I understand correctly, these tests are saying that overall, there is no effect of condition. However, I don't understand what to do then with the significant coefficients - should I still report them? Or, is there something else I am missing in the interpretation of the models? In my mind, this would be analogous to conducting post-hoc tests after a non-significant omnibus test in ANOVA, which is generally considered bad practice. On the other hand, the coefficients do show a strong effect of condition 4.

becbot
  • 71
  • 5
  • 1
    You need to write down the hypothesis you are testing. I doubt that you are really interested in the hypotheses tested in the summary of `model_2`. The likelihood ratio test also tests a different hypothesis than the p-values of coefficients in `model_1`. – Roland Jul 30 '21 at 11:31
  • @Roland done! We are interested in comparing the likelihood of the outcome occurring between conditions. Yes, I assume that the likelihood ratio is testing the overall effect of condition, whereas the coefficients in model_1 test the difference between conditions. However, if the overall effect of condition is non-significant, I was not sure if it is correct to still report the significant coefficients comparing conditions with each other. – becbot Jul 30 '21 at 11:49
  • Although there are many diverse questions here, the subject *sounds* like it has been treated at https://stats.stackexchange.com/questions/24720. – whuber Jul 30 '21 at 14:23

1 Answers1

-1

A major problem with significance tests is that if you run many tests, under the null hypothesis the chance that one (or a few) of them is significant is higher than your nominal significance level. This is relatively easy to understand. Say one test has a type I error probability of 5%, then you run another one that also has the same type I error probability. It should be clear that the chance that one of them makes a type I error is more than 5% and up to 10%.

For this reason, looking at many p-values and picking one or two that are significant for interpretation can easily go wrong. You basically look at four p-values for the four conditions here in the first analysis; one of them is significant, but it isn't at all clear whether this is a meaningful result. It is therefore a good idea to keep the number of tests to be interpreted down. This is a good reason to run the chi square/deviance test first, which gives you a single p-value for all conditions, and to stop if this one isn't significant. Because of what I explained above, there is no contradiction between this and the fact that you find a significant p-value among the individual conditions. The latter doesn't seem to be meaningful in this situation.

Christian Hennig
  • 10,796
  • 8
  • 35