27

Here is a little background about my situation: my data refer to the number of prey successfully eaten by a predator. As the number of prey is limited (25 available) in each trial, I had a column "Sample" representing the number of available prey (so, 25 in each trial), and another called "Count" which was the number of success (how many prey were eaten). I based my analysis on the example from the R book on proportion data (page 578). The explanatory variables are Temperature (4 levels, which I treated as factor), and Sex of the predator (obviously, male or female). So I end up with this model:

model <- glm(y ~ Temperature+Sex+Temperature*Sex, 
          data=predator, family=quasibinomial) 

After getting the Analysis of Deviance table, it turns out that Temperature and Sex (but not the interaction) have a significant effect on the consumption of prey. Now, my problem: I need to know which temperatures differ, i.e., I have to compare the 4 temperatures to each other. If I had a linear model, I would use the TukeyHSD function, but as I am using a GLM I can't. I have been looking through the package MASS and trying to set up a contrast matrix but for some reason it doesn't work. Any suggestions or references?

Here's the summary I get from my model, if that helps making it clearer ...

y <- cbind(data$Count, data$Sample-data$Count)
model <- glm(y ~ Temperature+Sex+Temperature*Sex, 
             data=predator, family=quasibinomial) 
> summary(model)

# Call:
# glm(formula = y ~ Temperature + Sex + Temperature * Sex, 
       family=quasibinomial, data=data)

# Deviance Residuals: 
#     Min       1Q   Median       3Q      Max  
# -3.7926  -1.4308  -0.3098   0.9438   3.6831  
    
# Coefficients:
#                                        Estimate Std. Error t value Pr(>|t|)    
# (Intercept)                             -1.6094     0.2672  -6.024 3.86e-08 ***
# Temperature8                             0.3438     0.3594   0.957   0.3414    
# Temperature11                           -1.0296     0.4803  -2.144   0.0348 *  
# Temperature15                           -1.2669     0.5174  -2.449   0.0163 *  
# SexMale                                    0.3822     0.3577   1.069   0.2882    
# Temperature8:SexMale                    -0.2152     0.4884  -0.441   0.6606    
# Temperature11:SexMale                    0.4136     0.6093   0.679   0.4990    
# Temperature15:SexMale                    0.4370     0.6503   0.672   0.5033    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
    
# (Dispersion parameter for quasibinomial family taken to be 2.97372)    
#     Null deviance: 384.54  on 95  degrees of freedom
# Residual deviance: 289.45  on 88  degrees of freedom
# AIC: NA   
# Number of Fisher Scoring iterations: 5
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Anne
  • 303
  • 1
  • 4
  • 6
  • 2
    Hi @Anne and welcome. You can try to use the `glht` function in the [`multcomp` package](http://cran.r-project.org/web/packages/multcomp/multcomp.pdf). To perform TukeyHSD tests for temperature, use it like that `glht(my.glm, mcp(Temperature="Tukey"))`. And btw: Your model formula can be abbreviated to: `model – COOLSerdash May 29 '13 at 14:13
  • Hi, thanks for your quick reply ! However I must be doing something wrong because I only get an error message... I assume that my.glm is the glm I performed earlier (therefore, "model" in the case). What does mcp refer to ? I get an error message saying that the dimensions of coefficients and covariance matrix don't match... ? – Anne May 29 '13 at 14:22
  • It would be helpful if you would edit your question and include the model output. – COOLSerdash May 29 '13 at 14:29
  • 3
    Why did you model `Temperature` as a factor? Don't you have the actual numerical values? I would use them as a continuous variable & then this entire issue is moot. – gung - Reinstate Monica May 29 '13 at 14:33
  • 2
    I am using Temperature as a factor because that is how I defined it in my experimental design. I guess it could be taken as a numerical value, but I would like to know how to solve my comparison problem on a factor all the same, as I might need it for further data (say, if I had "Substrate type" instead of Temperature). * COOLSerdash, I don't know if my editing is what you were looking for as a "model output"... Tell me if I got it wrong and I'll make my best to fix it. – Anne May 29 '13 at 15:30
  • 4
    It's perfectly reasonable to want to know how to do this in general; your question stands. However, w/ regard to your specific situation, I would use temp as a continuous variable even if you had originally thought of it as a factor. Setting aside issues w/ multiple comparisons, modeling temp as a factor is an inefficient use of the info you have. – gung - Reinstate Monica May 29 '13 at 15:48

1 Answers1

19

Anne, I will shorty explain how to do such multiple comparisons in general. Why this doesn't work in your specific case, I don't know; I'm sorry.

But normally, you can do it with the multcomp package and the function glht. Here is an example:

mydata      <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mydata$rank <- factor(mydata$rank)
my.mod      <- glm(admit~gre+gpa*rank, data=mydata, family=quasibinomial)

summary(my.mod)
# 
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)  
# (Intercept) -4.985768   2.498395  -1.996   0.0467 *
# gre          0.002287   0.001110   2.060   0.0400 *
# gpa          1.089088   0.731319   1.489   0.1372  
# rank2        0.503294   2.982966   0.169   0.8661  
# rank3        0.450796   3.266665   0.138   0.8903  
# rank4       -1.508472   4.202000  -0.359   0.7198  
# gpa:rank2   -0.342951   0.864575  -0.397   0.6918  
# gpa:rank3   -0.515245   0.935922  -0.551   0.5823  
# gpa:rank4   -0.009246   1.220757  -0.008   0.9940  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

If you wanted to calculate the pairwise comparisons between rank using Tukey's HSD, you could do that in this way:

library(multcomp)
summary(glht(my.mod, mcp(rank="Tukey")))
# 
#    Simultaneous Tests for General Linear Hypotheses
# 
# Multiple Comparisons of Means: Tukey Contrasts
# 
# Fit: glm(formula = admit ~ gre + gpa * rank, family = quasibinomial, data = mydata)   
# 
# Linear Hypotheses:
#            Estimate Std. Error z value Pr(>|z|)
# 2 - 1 == 0   0.5033     2.9830   0.169    0.998
# 3 - 1 == 0   0.4508     3.2667   0.138    0.999
# 4 - 1 == 0  -1.5085     4.2020  -0.359    0.984
# 3 - 2 == 0  -0.0525     2.6880  -0.020    1.000
# 4 - 2 == 0  -2.0118     3.7540  -0.536    0.949
# 4 - 3 == 0  -1.9593     3.9972  -0.490    0.960
# (Adjusted p values reported -- single-step method)
# 
# Warning message:
# In mcp2matrix(model, linfct = linfct) :
#   covariate interactions found -- default contrast might be inappropriate

All pairwise comparisons are given together with a $p$-value. Warning: These comparisons only concern the main effects. The interactions are ignored. If interactions are present, a warning will be given (as in the output above). For a more extensive tutorial on how to proceed when interactions are present, see these additional multcomp examples.

Note: As @gung noted in the comments, you should - whenever possible - include temperature as a continuous rather than a categorical variable. Concerning the interaction: you could perform a likelihood ratio test to check whether the interaction term significantly improves the model fit. In your case, the code would look something like that:

# Original model
model <- glm(y ~ Temperature+Sex+Temperature*Sex, data=predator, family=quasibinomial) 

# Model without an interaction
model2 <- glm(y ~ Temperature+Sex data=predator, family=quasibinomial) 

# Likelihood ratio test
anova(model, model2, test="LRT")

If this test is not significant, you may remove the interaction from your model. Maybe glht will work then?

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
  • 1
    Oh god, thank you SO much !! I have been able to write the command correctly this time and it worked ! Thanks again ! – Anne May 29 '13 at 17:45
  • 1
    Additionnal question : is there a way to get multiple comparisons on the interaction ? I've got similar data, where the interaction (from the initial question, that would be Temperature*Sex) is significant, and I was wondering if it is possible to compare those together... – Anne Nov 01 '13 at 12:10
  • 1
    Do you mean multiple comparison for each level of the interaction? If yes, you might find [this site](http://www.ats.ucla.edu/stat/r/faq/testing_contrasts.htm) interesting (the last paragraph shows how to test all possible pairwise combinations). – COOLSerdash Nov 01 '13 at 21:20
  • you can create a variable which corresponds to the interactions for a variable and use this variable to carry out the mcp. You do it like this. mydata\$gparank – Notquitesure Sep 09 '14 at 08:33
  • @COOLSerdash, that link no longer works - any idea where it is now? – Nova Aug 29 '17 at 14:15
  • 1
    @Nova which link do you mean? The one in the comments? [Here](https://stats.idre.ucla.edu/r/faq/how-can-i-test-contrasts-in-r/) is the new link to that site. – COOLSerdash Aug 29 '17 at 15:46