höstvår kön gdksammatermin ålder fbkurs stterm prog kalender
1 vår man FALSE 69 FALSE vår FRIST 11
2 vår man NA 70 FALSE vår FRIST 12
3 höst kvinna NA 65 FALSE höst FRIST 7
4 höst kvinna TRUE 68 FALSE höst FRIST 11
5 höst kvinna NA 65 FALSE höst OVRIG 8
6 höst man FALSE 70 TRUE höst FRIST 13
I apologize, this data is in Swedish. But I do not think they'll complicate the question I wonder abou.
I did a logistic regression with the following results:
mod.fit<-glm(gdksammatermin ~prog+poly(ålder, 3)+höstvår+kön+fbkurs+kalender, family=binomial,data=both)
summary(mod.fit)
Call:
glm(formula = gdksammatermin ~ prog + poly(ålder, 3) + höstvår +
kön + fbkurs + kalender, family = binomial, data = both)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0706 -0.8988 -0.5165 0.9902 2.7866
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.00938 0.37623 5.341 9.25e-08 ***
progLARAA 0.43279 0.26116 1.657 0.09748 .
progNDATK -2.87554 0.73921 -3.890 0.00010 ***
progNFYSK 0.54302 0.20480 2.651 0.00802 **
progNMATK 0.24716 0.17088 1.446 0.14806
progNSFYY 0.76268 0.30490 2.501 0.01237 *
progOVRIG 0.01900 0.18000 0.106 0.91593
progSMEKK -0.57752 0.18718 -3.085 0.00203 **
poly(ålder, 3)1 -50.49014 5.24085 -9.634 < 2e-16 ***
poly(ålder, 3)2 15.66993 5.98530 2.618 0.00884 **
poly(ålder, 3)3 -10.31320 5.33974 -1.931 0.05343 .
höstvårhöst -0.21046 0.12254 -1.717 0.08590 .
könman -0.07997 0.10776 -0.742 0.45804
fbkursTRUE -0.10225 0.16954 -0.603 0.54645
kalender -0.22422 0.03064 -7.318 2.51e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2484.8 on 1874 degrees of freedom
Residual deviance: 2080.0 on 1860 degrees of freedom
(2146 observations deleted due to missingness)
AIC: 2110
Number of Fisher Scoring iterations: 5
but when I do anova they prove that some are not significant anymore:
anova(mod.fit, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: respons
Terms added sequentially (first to last
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 1874 2484.8
prog 7 105.684 1867 2379.1 < 2.2e-16 ***
poly(ålder, 3) 3 241.500 1864 2137.6 < 2.2e-16 ***
höstvår 1 0.534 1863 2137.1 0.4647
kön 1 0.659 1862 2136.4 0.4168
fbkurs 1 1.635 1861 2134.8 0.2010
kalender 1 54.788 1860 2080.0 1.342e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I've checked some different questions here on cross validated but I still do not straighten it. I want to know what I should check out. I want to find the model that "best" explains gdksammatermin. I want only the most important covariates and then see who they are affecting gdksammatermin and how they affect. So where should I check? anova()
or glm()
?
I want to find out which covariates I can exclude and which ones are important to keep in the model.