I was doing some log-linear models to test for interactions/associations in multiway contingency tables (based on the tutorial here, http://ww2.coastal.edu/kingw/statistics/R-tutorials/loglin.html). I was doing this using a Poisson glm
on the observed frequencies as well as with MASS
's loglm
. I was just wondering though what type of hypothesis test would make most sense here, sequential type I using anova()
(not good since p values there depend on the order of the factors in the model), type III test using Anova()
in car
(independent of the order of the factors in the model) or using drop1
starting from the most complex model?
E.g. using the Titanic passenger survival data
library(COUNT)
data(titanic)
titanic=droplevels(titanic)
head(titanic)
mytable=xtabs(~class+age+sex+survived, data=titanic)
ftable(mytable)
survived no yes
class age sex
1st class child women 0 1
man 0 5
adults women 4 140
man 118 57
2nd class child women 0 13
man 0 11
adults women 13 80
man 154 14
3rd class child women 17 14
man 35 13
adults women 89 76
man 387 75
freqdata=data.frame(mytable)
fullmodel=glm(Freq~SITE*SEX*MORTALITY,family=poisson,data=freqdata)
Would the most sensible test for interactions between the different categorical factors then be given by type I SS as in
anova(fullmodel, test="Chisq")
Analysis of Deviance Table
Model: poisson, link: log
Response: Freq
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 23 2173.33
class 2 231.18 21 1942.15 < 2.2e-16 ***
age 1 1072.61 20 869.54 < 2.2e-16 ***
sex 1 137.74 19 731.80 < 2.2e-16 ***
survived 1 77.61 18 654.19 < 2.2e-16 ***
class:age 2 32.41 16 621.78 9.178e-08 ***
class:sex 2 29.61 14 592.17 3.719e-07 ***
age:sex 1 6.09 13 586.09 0.01363 *
class:survived 2 132.69 11 453.40 < 2.2e-16 ***
age:survived 1 25.58 10 427.81 4.237e-07 ***
sex:survived 1 312.93 9 114.89 < 2.2e-16 ***
class:age:sex 2 4.04 7 110.84 0.13250
class:age:survived 2 35.45 5 75.39 2.002e-08 ***
class:sex:survived 2 73.71 3 1.69 < 2.2e-16 ***
age:sex:survived 1 1.69 2 0.00 0.19421
class:age:sex:survived 2 0.00 0 0.00 1.00000
or using type III SS using car
's Anova
:
library(car)
library(afex)
set_sum_contrasts()
Anova(fullmodel, test="LR", type="III")
Analysis of Deviance Table (Type III tests)
Response: Freq
LR Chisq Df Pr(>Chisq)
class 37.353 2 7.744e-09 ***
age 5.545 1 0.0185317 *
sex 0.000 1 0.9999999
survived 1.386 1 0.2390319
class:age 5.476 2 0.0646851 .
class:sex 0.000 2 1.0000000
age:sex 0.000 1 0.9999888
class:survived 16.983 2 0.0002052 ***
age:survived 0.056 1 0.8126973
sex:survived 0.000 1 0.9999953
class:age:sex 0.000 2 1.0000000
class:age:survived 3.461 2 0.1771673
class:sex:survived 0.000 2 1.0000000
age:sex:survived 0.000 1 0.9999905
class:age:sex:survived 0.000 2 1.0000000
or using single term deletions and LRTs with drop1
:
fullmodel=glm(Freq~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, family=poisson, data=freqdata)
drop1(fullmodel,test="Chisq")
Single term deletions
Model:
Freq ~ class + age + sex + survived + class:age + class:sex +
class:survived + age:sex + age:survived + sex:survived
Df Deviance AIC LRT Pr(>Chi)
<none> 114.89 249.01
class:age 2 162.76 292.89 47.877 4.016e-11 ***
class:sex 2 115.74 245.86 0.850 0.6537
class:survived 2 230.95 361.08 116.067 < 2.2e-16 ***
age:sex 1 114.89 247.02 0.008 0.9294
age:survived 1 134.39 266.52 19.505 1.003e-05 ***
sex:survived 1 427.81 559.94 312.927 < 2.2e-16 ***
?
[This last result appears to match that of MASS
's loglm
, as should be the case :
fullmodel=loglm(~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, mytable)
stepAIC(fullmodel)
drop1(fullmodel,test="Chisq")
Single term deletions
Model:
~class + age + sex + survived + class:age + class:sex + class:survived +
age:sex + age:survived + sex:survived
Df AIC LRT Pr(>Chi)
<none> 144.89
class:age 2 188.76 47.877 4.016e-11 ***
class:sex 2 141.74 0.850 0.6537
class:survived 2 256.95 116.067 < 2.2e-16 ***
age:sex 1 142.89 0.008 0.9294
age:survived 1 162.39 19.505 1.003e-05 ***
sex:survived 1 455.81 312.927 < 2.2e-16 ***
]
(Any other more elegant ways btw to specify a model with main effects + all first order interaction effects?)
Any thoughts what would be the best way to analyse such multiway contingency tables, and adequately test for associations for unbalanced data sets?
EDIT: based on the answer below I went for the drop1
solution :
fullmodel=glm(Freq~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, family=poisson, data=freqdata)
drop1(fullmodel,test="Chisq")
which is equivalent to the log-linear model in MASS
:
fullmodel=loglm(~class+age+sex+survived+class:age+class:sex+class:survived+age:sex+age:survived+sex:survived, mytable)
stepAIC(fullmodel)
drop1(fullmodel,test="Chisq")