I'm modeling a dataset with multiple years in period 1 and a single year in period 2. The main question is whether period 1 differs from period 2. The years are therefore modeled as random.
However, the output I'm getting surprised me, since it looks like the random effects estimated are not as variable as they seem from the data. Is the single random level in the period 2 reducing the estimated random variability? If so - is there a better way of modeling these data?
toy <- structure(list(Period = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("Period1", "Period2"), class = "factor"),
Value = c(5332.81481481481, 2711.11111111111, 1600, 622.222222222222,
1198, 2190, 2043, 2345, 1957, 1138, 474, 1328, 1647, 1155,
1672, 2000, 5113, 2018, 2650, 9512, 793, 50, 550, 3900, 3215,
4900, 5075, 3000, 6100, 8000, 4000), YearFac = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L
), .Label = c("2006", "2013", "2017", "2018", "2019"), class = "factor")),
row.names = c(NA,
-31L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Period",
"Value", "YearFac"))
Analysis
library(plyr)
library(nlme)
library(ggplot2)
mod <- lme(Value ~ Period, random = ~1|YearFac, data = toy, method = "ML")
toy$Pred <- predict(mod, level = 0)
toy$Pred.random <- predict(mod)
means <- ddply(toy, .(Period, YearFac), summarize, Mean = mean(Value))
ggplot(toy) +
geom_point(aes(x = Period, y = Value, colour = YearFac),
position = position_dodge(width = 0.4)) +
geom_point(aes(x = Period, y = Pred, shape = "Pop-level prediction"), size = 3) +
geom_point(aes(x = Period, y = Pred.random, shape = "Random prediction", group = YearFac),
position = position_dodge(width = 0.4), size = 2) +
geom_point(data = means, aes(x = Period, y = Mean,
group = YearFac, fill = "mean"), shape = 21, size = 2,
position = position_dodge(width = 0.4)) +
theme_bw()