Is my overdispersion too large in this quasibinomial model?

Question

I have used a quasibinomial model on my data, but my overdispersion coefficient seems to be too large with a value of 40.78776.

 glm(formula = total_SP/all_SP ~ Campus + Gender + Programme + 
    Total_testscore + Hours_Math_SE + Gender, family = quasibinomial(link = "logit"), 
    data = starters, weights = all_SP)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-25.311   -3.541    0.167    4.634   14.491  

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -1.091748   0.170743  -6.394 8.79e-10 ***
Campusghent     -0.004444   0.085363  -0.052  0.95853    
Genderfemale     0.093789   0.078242   1.199  0.23187    
ProgrammeInt     0.232205   0.085364   2.720  0.00702 ** 
Total_testscore  0.038353   0.014448   2.655  0.00849 ** 
Hours_Math_SE    0.143800   0.026413   5.444 1.32e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for quasibinomial family taken to be 40.78776)

    Null deviance: 12429.4  on 237  degrees of freedom
Residual deviance:  9980.1  on 232  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4

According to the AICcmodavg package, overdispersion shouldn't be >4:

Note that values of c-hat > 1 indicate overdispersion (variance > mean), but that values much higher than 1 (i.e., > 4) probably indicate lack-of-fit.

The purpose of the model is to explain the data, not to predict. I want to find out which are the most influential predictors.

My data look like this (but I cannot post it):

> str(starters)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   238 obs. of  51 variables:
 $ Campus                           : Factor w/ 2 levels "bru","ghent": 2 2 2 2 2 2 2 2 2 1 ...
 $ Gender                           : Factor w/ 2 levels "male","female": 1 1 1 2 2 1 2 1 2 2 ...
 $ Generation_student               : Factor w/ 2 levels "J","N": 1 1 1 1 1 1 1 1 1 1 ...
 $ New_in_programme                 : Factor w/ 2 levels "J","N": 1 1 1 1 1 1 1 1 1 1 ...
 $ Programme                        : Factor w/ 2 levels "Arch","Int": 1 1 1 1 1 1 1 1 1 2 ...
 $ SE_track                         : Factor w/ 3 levels "ASO","KSO","TSO": 1 2 3 1 1 1 2 3 1 3 ...
 $ Secondary_education              : Factor w/ 72 levels "2e lj 3e gr Architecturale vorming KSO",..: 28 16 25 30 28 70 16 25 28 62 ...
 $ Hours_Math_SE                    : num  3 6 4 6 4 6 6 4 3 3 ...
 $ Total_testscore                  : num  13 11 12 11 9 13 12 12 14 8 ...
 $ CSE                              : num  33 67 100 67 17 50 83 100 100 50 ...
 $ Percentage                       : num  30.8 50 59.2 56.7 40 ...
 $ Motivation_RAW                   : num  28 30 31 30 29 22 28 30 24 34 ...
 $ Motivation_Norm                  : Factor w/ 5 levels "average","good",..: 1 2 2 2 1 4 1 2 5 3 ...
 $ Time_RAW                         : num  21 22 30 23 24 12 32 31 23 29 ...
 $ Time_NORM                        : Factor w/ 5 levels "average","good",..: 5 5 3 1 1 4 3 3 1 2 ...
 $ Concentratie_RAW                 : num  24 25 29 26 26 14 26 35 28 31 ...
 $ Concentration_NORM               : Factor w/ 5 levels "average","good",..: 5 1 2 1 1 4 1 3 1 2 ...
 $ Anxiety_RAW                      : num  27 31 36 29 17 31 28 26 30 22 ...
 $ Anxiety_NORM                     : Factor w/ 5 levels "average","high",..: 3 5 5 3 4 5 3 1 5 1 ...
 $ Teststrategieen_RAW              : num  30 25 32 25 25 27 29 32 33 28 ...
 $ Teststrategieen_NORM             : Factor w/ 5 levels "average","good",..: 1 5 2 5 5 5 1 2 2 1 ...
 $ Hours_Math_SE_f                  : Ord.factor w/ 3 levels "low"<"medium"<..: 1 3 2 3 2 3 3 2 1 1 ...
 $ Percentage_f                     : Factor w/ 3 levels "low","medium",..: 1 2 2 2 1 2 3 3 3 1 ...
 $ Total_testscore_f                : Factor w/ 4 levels "(0,5]","(5,10]",..: 3 3 3 3 2 3 3 3 4 2 ...
 $ CSE_f                            : Ord.factor w/ 4 levels "unsufficient"<..: 2 3 4 3 1 2 4 4 4 2 ...
 $ total_SP                         : num  185 300 355 340 240 295 385 400 390 235 ...
 $ all_SP                           : num  600 600 600 600 600 600 600 600 600 600 ...

What can I do to fix my model?

To get a sense of what lack-of-fit could mean in a logistic regression context, it may help you to read my answers: [Understanding lack of fit in logistic regression](https://stats.stackexchange.com/a/233826/7290), & [Test logistic regression model using residual deviance and degrees of freedom](https://stats.stackexchange.com/a/248978/7290). — gung - Reinstate Monica, May 30 '19 at 17:44
@gung SP are the study point obtained, while total_SP is the total amount that can be obtained. — user1607, May 30 '19 at 18:04
What does that mean? Are these points on a test? Are they number of items correct out of a total number of questions? Something else? — gung - Reinstate Monica, May 30 '19 at 18:36
Maybe the logistic regression with random person-specific random intercept is better for your data. — user158565, May 31 '19 at 01:49
@user158565 could you provide a reference or some more info? — user1607, May 31 '19 at 06:44
It is a common method. You can search on internet by "mixed effect logistic regression". — user158565, May 31 '19 at 18:12

Is my overdispersion too large in this quasibinomial model?

0 Answers0