For N subjects, a test is run 3 times, tutoring success or failure each time. Thus, for each subject, there are 3 binary result whose count of success Y
is the dependent variable of interest. For each subject, their cognitive abilities A
and B
are assessed, yielding two numerical scores 0-20. Finding out how A
and B
affects Y
is the interest of the study. As a developmental study, subjects' Y
scores will also be affected by their Age
, and their language ability (that increases along with age), which is assessed by two supposedly independent measures L1
and L2
. Here we need to partial out the effects of age (Age
) and language (L1
and L2
) and explore Y=F(A,B)
.
I'm mostly trying to decide between one if there following models. What do they actually mean in this context, and what assumptions they make? Which one fits the research question most?
Using
glmer
fromlme4
:fit.glmer = glmer(Y ~ A*B + (1|Age) + (1|Age:L1) + (1|Age:L2), family=binomial, data)
Using
glm
from base R:fit.glm = glm(Y ~ A*B + Age*(L1 + L2), family=binomial, data)
Using
pcor
fromppcor
, for which I have no idea yet.I think I can also do it step by step. First, fit
glm
forY~Age
, then use the residuals to fitglm
forResiduals ~ L1+L2
. Then take this residual to fit the lastglm
:Residual2 ~ A*B
. But this seems essentially the same as theglm
method above.
What is the fundamental difference in these method and their assumptions for this research problem? Which of the model of the most plausible one that violates least assumptions and gives the results of interest?