Nested data in logistic regression

Question

Setting:

In my study, each of three readers (A, B,C) applies three different qualitative scores (score1, score2, score3, each an ordinal scale) to the same set of 95 cases. score1 for example is a score how the reader rates the severity of a case (1 = not severe, 10 = extremely severe).

For some of the cases (in the example data case 92-95) they apply the score multiple times (i.e., at different time points without a special event between the time points).

Example data:

library("lme4")
#> Loading required package: Matrix

# example data
set.seed(1)
df <- data.frame(reader=rep(c("A","B","C"),each=100),
                 case=rep((c(rep(1:91),92,92,93,93,94,94,95,95,95)),3),
                 score1=sample(1:10,300,replace=TRUE),
                 score2=sample(5:10,300,replace=TRUE),
                 score3=sample(2:10,300,replace=TRUE),
                 class=sample(0:1,300,replace=TRUE))
str(df)
#> 'data.frame':    300 obs. of  6 variables:
#>  $ reader: chr  "A" "A" "A" "A" ...
#>  $ case  : num  1 2 3 4 5 6 7 8 9 10 ...
#>  $ score1: int  9 4 7 1 2 7 2 3 1 5 ...
#>  $ score2: int  5 8 8 9 8 5 9 10 9 6 ...
#>  $ score3: int  5 6 7 5 3 2 4 2 4 3 ...
#>  $ class : int  0 1 1 1 1 1 1 1 0 1 ...

^{Created on 2022-02-23 by the reprex package (v2.0.1)}

Aim:

Now I would like to investigate the association between the scores1-3 (independent variables) and a class (dependent variable, 1 or 0).

I could do this with a simple logistic regression in R:

glm(class ~ score1 + score2 + score3, family="binomial", data = df)

However, since the data is clustered/nested, I think I get too low p-values for the independent variables.

Question:

What analyisis is most appropriate to meet the level of nesting of my data?

My solutions:

Averaging

Average eachscore1-3 among the readers and among the cases with multiple measurements and perform a simple logistic regression as mentioned above.

Use a mixed-effects model

I found some advice for nested data: Mixed Effects Model with Nesting and What is the difference between fixed effect, random effect and mixed effect models?

However, since I am new to mixed-effects models I am sure which variable should be considered as fixed and random:

Only reader as random effect

mod1 <- glmer(class ~ score1 + score2 + score3+ (1|reader), family="binomial", data = df)
#> boundary (singular) fit: see help('isSingular')

Reader and case as random effect

mod2 <- glmer(class ~ score1 + score2 + score3+ (1|reader/case), family="binomial", data = df)
#> boundary (singular) fit: see help('isSingular')

I think I get the warning boundary (singular) fit: see help('isSingular') because the effects are very small in the test data.

Toby · Answer 1 · 2022-03-02T13:16:53.867

0

Your example data are not meaningful, therefore I can only give advice on the description of your data.

I assume reader1 provides score1 etc. Therefore the scores are nested and you should use a GLMM. You probably can build a model like this:

mod <- glmer(class ~ (1 + score1 + score2 + score3|reader), family="binomial", data = df)

What it does is treating intercept, score1, score2, score3 both as fixed and random effects in your model. I'm not sure about your variable case. There might be just too little cases per reader.

If you don't know, if your variables should be treated as fixed or random, see this question on CV.

In genneral, there is no straightforward way to build your model. It depends on the given data. You need to check, if the varibles in question are significant in your model.

Edit

Why I think you should not use reader/case:

Mixed models, for my understanding, assume dependency within groups and independency with other groups. Think of students in schools. There you can group students in classes. All classes of one school are subject to the same influences. But classes from another school don't. In your data I don't see this structure, since each reader creates one score for each case. Each case is handled by each reader.

Therefore I'd say you have grouped, but not nested data within groups.

edited Mar 02 '22 at 13:16

answered Feb 23 '22 at 12:30

Toby

376
3
12

Thank you for your thoughts. Each reader provides all three scores (score1, score2, score3) for each case. This was ambiguous in the data description (pardon!) and I corrected it. – ava Feb 23 '22 at 12:44
Is your answer still valid with the information from my comment above? – ava Feb 24 '22 at 01:30
In this case you could treat your variables as fixed and random effects within cases: class ~ (1 + score1 + score2 + score3|reader) + (1 + score1 + score2 + score3|case). Depending on your data this might lead to bad estimation due to many parameters/ few data. – Toby Feb 25 '22 at 09:41
Thank you. Why is `score1 + score2 + score3+ (1|reader/case)` not appropriate? – ava Mar 01 '22 at 18:11
Please see Edits – Toby Mar 02 '22 at 13:17
Thank you for the explanations. `cases: class ~ (1 + score1 + score2 + score3|reader) + (1 + score1 + score2 + score3|case)` does only treat the Intercept as fixed effects, correct? Should the other variabels not also additionaly be treated as fixed effects? – ava Mar 02 '22 at 16:14

Nested data in logistic regression

1 Answers1