Multilevel analysis of survey data

Question

I am trying to conduct a multilevel analysis of data from PISA 2018 survey but I am not quite sure how to proceed with the formulation of my model. The data I am using could be downloaded from: https://www.oecd.org/pisa/data/2018database/ . The idea is to have students Mathematics scores as dependent variable in one regression and then Reading scores in another regression. Both of these will be regressed on variables at the student level such as gender, socio-economic status etc. as level-1 explanatory variables and variables at the school level (level-2) such as school size, whether the school is private or public and so on. The set-up for my data is the following:

library(intsvy)

pisa = pisa.select.merge(folder = file.path("YourDirectory"),
                            student.file="CY07_MSU_STU_QQQ.sav",
                            school.file="CY07_MSU_SCH_QQQ.sav", 
                         student = c("ST004D01T", "ESCS", "IMMIG"), 
                         school = c("SCHLTYPE", "SCHSIZE", "STRATIO", "EDUSHORT", "STAFFSHORT"), 
                         countries = c("BEL", "CZE", "DNK", "ESP", "FRA", "GBR", "GRC", "HUN","ITA",
                                       "LTU", "NLD", "NOR", "POL", "SVK", "SVN"))

pisa$CNTRYID = as.factor(pisa$CNTRYID)
pisa$CNTRYID = revalue(pisa$CNTRYID, c("56" = "Belgium", "203" = "Czech Republic", "208" = "Denmark", "724" = "Spain",
                                                 "250" = "France", "826" = "United Kingdom", "300" = "Greece", "348" = "Hungary",
                                                 "380" = "Italy", "440" = "Lithuania", "528" = "Netherlands", "578" = "Norway",
                                                 "616" = "Poland", "703" = "Slovak Republic", "705" = "Slovenia"))

I would like to use the "lme4" package for my analysis. The problem I'm facing is that I have an idea of how my regression should probably look like, I will give an illustrative example with the null model and "lme4" and "lmerTest" package:

library(lme4)
library(lmerTest)

reg = lmer(MathScores ~ 1 + (1|studentid) + (1|schoolid)

but I don't which variable to put for MathScores. This is because in the data there are 10 plausible values (PV) for Math and Read each in a different column: PV1MATH, PV2MATH ... PV10MATH and the same for READ, so I am not sure whether I have to take the mean or if I have to do something else. Maybe you can have a look here: https://www.r-bloggers.com/sampling-weights-and-multilevel-modeling-in-r/ ,where I found a similar analysis of what I would like to do. In the link I provided, the author uses PV1MATH as dependent variable, however, I did not find any reasoning why this particular PV.

I would appreciate any help that could lead me to the right direction and if there is some more information that I have not provided feel free to let me know and I will do my best to provide it. Thank you in advance!

P.S. I have tried asking in StackOverflow but I was told that this is a more statistical question so that is why I am posting it here.

score 1 · Answer 1 · answered Mar 26 '20 at 05:21

1

The plausible values are what other branches of statistics call multiple imputations. So you want to fit the same model with each of PV1MATH...PV10MATH as the outcome, and then combine the results according to Rubin's rule or similar.

The combined point estimate is just the average of the point estimates from each model. The standard errors are more complicated; they incorporate both the estimated standard errors from each model, and the between imputation variation in the point estimates. Packages that can do this include mitools and mice

answered Mar 26 '20 at 05:21

Thomas Lumley

21,784
1
22
73

Thank you for your answer! I will have a look at the packages you mentioned. I have also seen that you have created the package svy2lme (if that is indeed you, of course) but in the description it says that it only allows two-level modelling at the moment. Is it going to support three-level modelling as well? In any case, I will have a look at it as well as it looks it fits the aims of some analyses of mine I plan to conduct. – G_Konyarov Mar 28 '20 at 10:20
As far as I have understood, at the moment there no packages that can fit a multilevel model with complex survey design. I have looked at this post: https://stats.stackexchange.com/questions/89204/fitting-multilevel-models-to-complex-survey-data-in-r and also in other places in the internet but there seems to be no solutions to this problem as for today. – G_Konyarov Apr 01 '20 at 10:46

Multilevel analysis of survey data

1 Answers1