Logistic regression on a developmental study partialling out the confounding age effect

Question

For N subjects, a test is run 3 times, tutoring success or failure each time. Thus, for each subject, there are 3 binary result whose count of success Y is the dependent variable of interest. For each subject, their cognitive abilities A and B are assessed, yielding two numerical scores 0-20. Finding out how A and B affects Y is the interest of the study. As a developmental study, subjects' Y scores will also be affected by their Age, and their language ability (that increases along with age), which is assessed by two supposedly independent measures L1 and L2. Here we need to partial out the effects of age (Age) and language (L1 and L2) and explore Y=F(A,B).

I'm mostly trying to decide between one if there following models. What do they actually mean in this context, and what assumptions they make? Which one fits the research question most?

Using glmer from lme4:

fit.glmer = glmer(Y ~ A*B + (1|Age) + (1|Age:L1) + (1|Age:L2), family=binomial, data)

Using glm from base R:

fit.glm = glm(Y ~ A*B + Age*(L1 + L2), family=binomial, data)

Using pcor from ppcor, for which I have no idea yet.
I think I can also do it step by step. First, fit glm for Y~Age, then use the residuals to fit glm for Residuals ~ L1+L2. Then take this residual to fit the last glm: Residual2 ~ A*B. But this seems essentially the same as the glm method above.

What is the fundamental difference in these method and their assumptions for this research problem? Which of the model of the most plausible one that violates least assumptions and gives the results of interest?

Are the three trials different in any way? Are you trying to assess improvement over trials, eg, or do the covariates change over trials? — gung - Reinstate Monica, Sep 28 '16 at 01:13
The three trials are the same; they were tested three times to decrease measurement error (the test yields only success or failure, and the the child could easily report failure due to some other random reasons such as getting distracted in that trial). Y there is simply a sum of the scores in all these three trials (0-3), but can be easily encoded into a 0-1 binary with a criterion-based rule. I am trying to assess whether A and B has any effect on Y, when controlling the confounds Age and L1 & L2. — hyiltiz, Sep 28 '16 at 01:21

score 0 · Accepted Answer · edited Apr 13 '17 at 12:44

You don't need to use a GLMM. There really isn't any need to assess any random effects or use them to control for non-independence. By default, people think of logistic regression as appropriate when the response is a Bernoulli trial, but it can be a binomial where there was more than one Bernoulli trial. You can use a regular old logistic regression with the response being the number of successes and number of failures for each child (cf., here). In R, you would use:

fit.glm = glm(cbind(successes, failures)~A*B + Age*(L1+L2), family=binomial, data)

From there, the effects of Age and language (L1 and L2) are simply nuisance variables. You can ignore them in the output. You can start by assessing the interaction that you care about (i.e., A*B). If the interaction is sufficiently non-significant for your purposes, I would drop it and refit the model without it. The reason for this is that its existence complicates the interpretation of the 'main effects'. I would not drop it from the model just because the p-value is .06, however. You need to look at the magnitude of the coefficient and its standard error and decide if it makes sense to ignore it. I would certainly want the p-value to be above .2, for example. If you believe the interaction is real, you interpret it / the simple effects directly. Otherwise, interpret the main effects from the model without the interaction. Whichever way you go, I would try to visualize the data and the model.

Thank you so much! Also, how does it differ from a model that treats the prediction Y as one of the grouping variable and the original predictors as prediction (e.g. `glm(A~Y+Age(L1+L2))` and `glm(B~Y+Age(L1+L2))`)? — hyiltiz, Oct 11 '16 at 03:28
That's a different question, @hyiltiz. (Both in the sense that the substantive scientific question is different, & that that is a different question than posted at this thread--it should probably be posted as a new question.) — gung - Reinstate Monica, Oct 11 '16 at 14:11

Logistic regression on a developmental study partialling out the confounding age effect

1 Answers1