I have used a quasibinomial model on my data, but my overdispersion coefficient seems to be too large with a value of 40.78776.
glm(formula = total_SP/all_SP ~ Campus + Gender + Programme +
Total_testscore + Hours_Math_SE + Gender, family = quasibinomial(link = "logit"),
data = starters, weights = all_SP)
Deviance Residuals:
Min 1Q Median 3Q Max
-25.311 -3.541 0.167 4.634 14.491
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.091748 0.170743 -6.394 8.79e-10 ***
Campusghent -0.004444 0.085363 -0.052 0.95853
Genderfemale 0.093789 0.078242 1.199 0.23187
ProgrammeInt 0.232205 0.085364 2.720 0.00702 **
Total_testscore 0.038353 0.014448 2.655 0.00849 **
Hours_Math_SE 0.143800 0.026413 5.444 1.32e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 40.78776)
Null deviance: 12429.4 on 237 degrees of freedom
Residual deviance: 9980.1 on 232 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 4
According to the AICcmodavg
package, overdispersion shouldn't be >4:
Note that values of c-hat > 1 indicate overdispersion (variance > mean), but that values much higher than 1 (i.e., > 4) probably indicate lack-of-fit.
The purpose of the model is to explain the data, not to predict. I want to find out which are the most influential predictors.
My data look like this (but I cannot post it):
> str(starters)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 238 obs. of 51 variables:
$ Campus : Factor w/ 2 levels "bru","ghent": 2 2 2 2 2 2 2 2 2 1 ...
$ Gender : Factor w/ 2 levels "male","female": 1 1 1 2 2 1 2 1 2 2 ...
$ Generation_student : Factor w/ 2 levels "J","N": 1 1 1 1 1 1 1 1 1 1 ...
$ New_in_programme : Factor w/ 2 levels "J","N": 1 1 1 1 1 1 1 1 1 1 ...
$ Programme : Factor w/ 2 levels "Arch","Int": 1 1 1 1 1 1 1 1 1 2 ...
$ SE_track : Factor w/ 3 levels "ASO","KSO","TSO": 1 2 3 1 1 1 2 3 1 3 ...
$ Secondary_education : Factor w/ 72 levels "2e lj 3e gr Architecturale vorming KSO",..: 28 16 25 30 28 70 16 25 28 62 ...
$ Hours_Math_SE : num 3 6 4 6 4 6 6 4 3 3 ...
$ Total_testscore : num 13 11 12 11 9 13 12 12 14 8 ...
$ CSE : num 33 67 100 67 17 50 83 100 100 50 ...
$ Percentage : num 30.8 50 59.2 56.7 40 ...
$ Motivation_RAW : num 28 30 31 30 29 22 28 30 24 34 ...
$ Motivation_Norm : Factor w/ 5 levels "average","good",..: 1 2 2 2 1 4 1 2 5 3 ...
$ Time_RAW : num 21 22 30 23 24 12 32 31 23 29 ...
$ Time_NORM : Factor w/ 5 levels "average","good",..: 5 5 3 1 1 4 3 3 1 2 ...
$ Concentratie_RAW : num 24 25 29 26 26 14 26 35 28 31 ...
$ Concentration_NORM : Factor w/ 5 levels "average","good",..: 5 1 2 1 1 4 1 3 1 2 ...
$ Anxiety_RAW : num 27 31 36 29 17 31 28 26 30 22 ...
$ Anxiety_NORM : Factor w/ 5 levels "average","high",..: 3 5 5 3 4 5 3 1 5 1 ...
$ Teststrategieen_RAW : num 30 25 32 25 25 27 29 32 33 28 ...
$ Teststrategieen_NORM : Factor w/ 5 levels "average","good",..: 1 5 2 5 5 5 1 2 2 1 ...
$ Hours_Math_SE_f : Ord.factor w/ 3 levels "low"<"medium"<..: 1 3 2 3 2 3 3 2 1 1 ...
$ Percentage_f : Factor w/ 3 levels "low","medium",..: 1 2 2 2 1 2 3 3 3 1 ...
$ Total_testscore_f : Factor w/ 4 levels "(0,5]","(5,10]",..: 3 3 3 3 2 3 3 3 4 2 ...
$ CSE_f : Ord.factor w/ 4 levels "unsufficient"<..: 2 3 4 3 1 2 4 4 4 2 ...
$ total_SP : num 185 300 355 340 240 295 385 400 390 235 ...
$ all_SP : num 600 600 600 600 600 600 600 600 600 600 ...
What can I do to fix my model?