Why do logistic regression and ANOVA give different p-value?

Question

    höstvår  kön  gdksammatermin  ålder fbkurs stterm  prog kalender
1     vår    man          FALSE    69  FALSE    vår   FRIST       11
2     vår    man             NA    70  FALSE    vår   FRIST       12
3    höst kvinna             NA    65  FALSE   höst   FRIST        7
4    höst kvinna           TRUE    68  FALSE   höst   FRIST       11
5    höst kvinna             NA    65  FALSE   höst   OVRIG        8
6    höst    man          FALSE    70   TRUE   höst   FRIST       13

I apologize, this data is in Swedish. But I do not think they'll complicate the question I wonder abou.

I did a logistic regression with the following results:

mod.fit<-glm(gdksammatermin ~prog+poly(ålder, 3)+höstvår+kön+fbkurs+kalender, family=binomial,data=both)
summary(mod.fit)

Call:
glm(formula = gdksammatermin ~ prog + poly(ålder, 3) + höstvår + 
    kön + fbkurs + kalender, family = binomial, data = both)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.0706  -0.8988  -0.5165   0.9902   2.7866  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)       2.00938    0.37623   5.341 9.25e-08 ***
progLARAA         0.43279    0.26116   1.657  0.09748 .  
progNDATK        -2.87554    0.73921  -3.890  0.00010 ***
progNFYSK         0.54302    0.20480   2.651  0.00802 ** 
progNMATK         0.24716    0.17088   1.446  0.14806    
progNSFYY         0.76268    0.30490   2.501  0.01237 *  
progOVRIG         0.01900    0.18000   0.106  0.91593    
progSMEKK        -0.57752    0.18718  -3.085  0.00203 ** 
poly(ålder, 3)1 -50.49014    5.24085  -9.634  < 2e-16 ***
poly(ålder, 3)2  15.66993    5.98530   2.618  0.00884 ** 
poly(ålder, 3)3 -10.31320    5.33974  -1.931  0.05343 .  
höstvårhöst      -0.21046    0.12254  -1.717  0.08590 .  
könman           -0.07997    0.10776  -0.742  0.45804    
fbkursTRUE       -0.10225    0.16954  -0.603  0.54645    
kalender         -0.22422    0.03064  -7.318 2.51e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2484.8  on 1874  degrees of freedom
Residual deviance: 2080.0  on 1860  degrees of freedom
  (2146 observations deleted due to missingness)
AIC: 2110

Number of Fisher Scoring iterations: 5

but when I do anova they prove that some are not significant anymore:

anova(mod.fit, test="Chisq")
Analysis of Deviance Table

Model: binomial, link: logit

Response: respons

Terms added sequentially (first to last


               Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                            1874     2484.8              
prog            7  105.684      1867     2379.1 < 2.2e-16 ***
poly(ålder, 3)  3  241.500      1864     2137.6 < 2.2e-16 ***
höstvår         1    0.534      1863     2137.1    0.4647    
kön             1    0.659      1862     2136.4    0.4168    
fbkurs          1    1.635      1861     2134.8    0.2010    
kalender        1   54.788      1860     2080.0 1.342e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I've checked some different questions here on cross validated but I still do not straighten it. I want to know what I should check out. I want to find the model that "best" explains gdksammatermin. I want only the most important covariates and then see who they are affecting gdksammatermin and how they affect. So where should I check? anova() or glm()? I want to find out which covariates I can exclude and which ones are important to keep in the model.

"Terms added sequentially (first to last)" is the key phrase in your `anova` call. The terms to search for are "Type I" and "Type III" [tests](http://stats.stackexchange.com/q/20452/4485). The `anova` is performing a T1, the logistic a T3. The quick version of the difference is that Type I tests $A$, $B|A$, $C|B,A$ while Type III tests $A|B,C$, $B|A,C$, $C|A,B$. — Affine, Apr 20 '15 at 12:48

Why do logistic regression and ANOVA give different p-value?

0 Answers0