I have recently faced a weird way of finding a significant variable.
Let the model be
$Cholesterol=Sex+FamilyHistory + Sex*FamilyHistory$
where the * denotes the interaction, sex and FamilyHistory are both two-level factors, and we assume that the study is nested inside cities. Further, assume that the fitted model is a linear mixed model with City in the random effect [random intercept].
Having the model above, we are interested to find out whether Sex has a significant effect on response.
What I normally do is to fit the model and look at the p-value of Sex coefficient in the marginal anova (simply using summary(nlme fit)
in R). However, I have been told that I should make a null model like
$Cholesterol=FamilyHistory$
Or maybe! ($Cholesterol=FamilyHistory + Sex*FamilyHistory$)
and compare it with the full model using ANOVA (simply anova(full.model, null.model)
in R).
So which one do you recommend?
UPDATE: as @Stefan requested, I give an example in R using the built-in example on lme
manual. Sorry that the example is in a different context.
Let the full model be [it is the same as fm2 in the lme example in R]
full <-
lme(
distance ~ age + Sex,
data = Orthodont,
random = ~ 1,
method = 'ML'
)
Now the goal of the analysis is to find whether Sex is significant or not (obviously on the response)!
What I do is to look at the p-value from the sex coefficient from the full model.
r1 = summary(full)
r1$tTable
Value Std.Error DF t-value p-value
(Intercept) 17.7067130 0.83154591 80 21.293729 1.057125e-34
age 0.6601852 0.06209293 80 10.632212 5.740140e-17
SexFemale -2.3210227 0.74306676 25 -3.123572 4.478461e-03
that suggests a significant (in $\alpha =10^{-2}$) deviation for females than males. However, I have been told that the true way of finding the Sex effect is to form a null model by excluding Sex and comparing the resulting model with the full one:
null <-
lme(
distance ~ age , # Here Sex is removed
data = Orthodont,
random = ~ 1,
method = 'ML'
)
anova(full, null)
Model df AIC BIC logLik Test L.Ratio p-value
null 1 4 451.3895 462.1181 -221.6948
full 2 5 444.8565 458.2671 -217.4282 1 vs 2 8.533057 0.0035
As you see the pvalue here ($0.0035$) is slightly smaller than the p-value from the full model ($4.478461e-03$). I understand that the statistics are different, in fact, anova uses the $\chi^2$ distribution and test of coefficients applies $T$ distribution. However, I need to know which one is the proper one!