Is my logic straight? Logistic regressions in R

Question

I'm doing an experiment on people's ability to read facial expressions on faces with and without covid-19 face masks. It's a between-subject design with repeated measures. One participant either saw 36 images with masks or 36 images without masks and their task was to tell which facial expression the face was showing. The 36 images were of 6 different people showing 6 different emotions. Condition 1 was with masks, 2 was without masks

I have a dataframe like this:

id = participant id

baseFace = id for image person

faceSex = gender of image person

faceEmotion = the facial expression of the image person

response = the participants response of which facial expression they think they saw

accuracy = well, accuracy

condition = if they saw condition 1 stimuli or condition 2 stimuli

id | baseFace | faceAgeGroup | faceSex | faceEmotion | response | accuracy | condition
1    001        young          male       angry        sad        0          1
1    002        medium         female     happy        happy      1          1
1    003        old            female     sad          sad        1          1
1    004        young          male       neutral      sad        0          1
2    001        medium         female     happy        happy      1          2
2    002        old            female     sad          sad        1          2
2    003        young          male       angry        sad        0          2
2    004        medium         female     happy        happy      1          2
3    001        old            female     sad          sad        1          2
3    002        young          male       neutral      sad        0          2
3    003        medium         female     happy        happy      1          2
3    004        old            female     sad          sad        1          2
4    001        young          male       angry        sad        0          1
4    002        medium         female     happy        happy      1          1
4    003        old            female     sad          sad        1          1
4    004        young          male       angry        sad        0          1
5    001        medium         female     happy        happy      1          2
5    002        old            female     sad          sad        1          2
5    003        medium         female     happy        happy      1          2
5    004        old            female     sad          sad        1          2

My hypothesis is

1 People are better at reading facial expressions in faces without face masks

2 People will report higher levels of confidence in decisions when evaluating faces without masks

AND NOW TO MY QUESTION

Since my outcome is binary (accuracy), I use a logistic regression model:

lme4::glmer(accuracy ~ condition + (1|id), df, family = binomial)

This is to me the most logical way to build the model since I'm using condition as predictor for accuracy. The question is: Would I violate any rules or my own hypothesis if I included more predictors or random intercepts?

Here is my whole dataframe (in .csv) if it is useful:

https://drive.google.com/file/d/1uEL3vVM2v_QB4Ep8AXlYIF-hZyJouQzh/view?usp=sharing

Thank you for any help.

I also posted my question on StackOverflow but I was told to try here instead.

I am not familiar with glmer, only with [glmnet](https://glmnet.stanford.edu/articles/glmnet.html), so I'm not sure what the syntax $(1|id)$ means. Intuitively, the more (good) regressors you have, the better (id is most likely not useful). The regression does not seem to be influenced by your hypotheses - it's actually the other way around: you would test for the significance of the coefficient of the feature *no mask* (condition 2). If your hypothesis is right it should be positive (i.e. positively affect accuracy). I'm not sure how you would test your second hypothesis though. — PaulG, Dec 20 '20 at 15:12
(1 | id) is the inclusion of random effects in the model. Variables left of | are random slopes, and variables on the right side of | are random intercepts. Okay, I'll take that into consideration. Thank you Paul! — PaulCrebs, Dec 20 '20 at 15:38
Some people might be easy to read, others less so. As a first response I would try to at least add a random intercept for `id` and see if that improves the model fit. Including more (reasonable) data such as `faceSex` and `faceAgeGroup` may take some noise from the data and so is usually a good idea. — Bernhard, Dec 20 '20 at 17:33

Isabella Ghement · Accepted Answer · 2020-12-20T17:47:16.697

Because condition does not change within a subject, I think it will be challenging to interpret its conditional effect in your binary logistic mixed effects model. It seems to me that what you really want in this case is to report a marginal effect of condition in your model.

The only R package I know of which reports marginal effects for such a model is GLMMadaptive. With this package, you would fit your model using the mixed_model() function:

m <- mixed_model(fixed = accuracy ~ condition, 
                  random = ~ 1 | id, 
                  data = df,
                  family = binomial())

summary(m)

The model summary reports the conditional effect of condition on the log odds scale.

After fitting the model, you would use the marginal_coefs() function to get the marginal effect of condition on the log odds scale:

marginal_coefs(m, std_errors = TRUE)

See https://cran.r-project.org/web/packages/GLMMadaptive/vignettes/GLMMadaptive_basics.html for more details.

See also this answer from Dr. Dimitris Rizopoulos, the creator of the GLMMadaptive package, to one of my own questions on this forum: Computation and interpretation of marginal effects in a GLMM. While his answer refers to mixed Poisson regression models, you can easily adapt it to your own situation by replacing the log link with the logit link and the expected value of the response with the odds.

Are you considering including a random intercept effect for subject appearing in the image? If yes, I don't think GLMMadaptive can handle crossed random effects - you would need to find a different way to compute your desired marginal effects.

If you don't include any other predictors in your current model except for condition, that would allow you to compare two groups of subjects in your underlying population of subjects in terms of marginal odds, ignoring any of the observed subject characteristics. If you include a subject-level predictor (e.g., gender), you will be able to compare marginal odds among the two conditions for the same gender (e.g., male). Note that marginal_coeffs() performs two-sided tests of hypotheses - you will need one-sided tests of hypotheses.

Is my logic straight? Logistic regressions in R

1 Answers1