I am trying to conduct a multilevel analysis of data from PISA 2018 survey but I am not quite sure how to proceed with the formulation of my model. The data I am using could be downloaded from: https://www.oecd.org/pisa/data/2018database/ . The idea is to have students Mathematics scores as dependent variable in one regression and then Reading scores in another regression. Both of these will be regressed on variables at the student level such as gender, socio-economic status etc. as level-1 explanatory variables and variables at the school level (level-2) such as school size, whether the school is private or public and so on. The set-up for my data is the following:
library(intsvy)
pisa = pisa.select.merge(folder = file.path("YourDirectory"),
student.file="CY07_MSU_STU_QQQ.sav",
school.file="CY07_MSU_SCH_QQQ.sav",
student = c("ST004D01T", "ESCS", "IMMIG"),
school = c("SCHLTYPE", "SCHSIZE", "STRATIO", "EDUSHORT", "STAFFSHORT"),
countries = c("BEL", "CZE", "DNK", "ESP", "FRA", "GBR", "GRC", "HUN","ITA",
"LTU", "NLD", "NOR", "POL", "SVK", "SVN"))
pisa$CNTRYID = as.factor(pisa$CNTRYID)
pisa$CNTRYID = revalue(pisa$CNTRYID, c("56" = "Belgium", "203" = "Czech Republic", "208" = "Denmark", "724" = "Spain",
"250" = "France", "826" = "United Kingdom", "300" = "Greece", "348" = "Hungary",
"380" = "Italy", "440" = "Lithuania", "528" = "Netherlands", "578" = "Norway",
"616" = "Poland", "703" = "Slovak Republic", "705" = "Slovenia"))
I would like to use the "lme4" package for my analysis. The problem I'm facing is that I have an idea of how my regression should probably look like, I will give an illustrative example with the null model and "lme4" and "lmerTest" package:
library(lme4)
library(lmerTest)
reg = lmer(MathScores ~ 1 + (1|studentid) + (1|schoolid)
but I don't which variable to put for MathScores. This is because in the data there are 10 plausible values (PV) for Math and Read each in a different column: PV1MATH, PV2MATH ... PV10MATH and the same for READ, so I am not sure whether I have to take the mean or if I have to do something else. Maybe you can have a look here: https://www.r-bloggers.com/sampling-weights-and-multilevel-modeling-in-r/ ,where I found a similar analysis of what I would like to do. In the link I provided, the author uses PV1MATH as dependent variable, however, I did not find any reasoning why this particular PV.
I would appreciate any help that could lead me to the right direction and if there is some more information that I have not provided feel free to let me know and I will do my best to provide it. Thank you in advance!
P.S. I have tried asking in StackOverflow but I was told that this is a more statistical question so that is why I am posting it here.