2

I'm modeling a dataset with multiple years in period 1 and a single year in period 2. The main question is whether period 1 differs from period 2. The years are therefore modeled as random.

However, the output I'm getting surprised me, since it looks like the random effects estimated are not as variable as they seem from the data. Is the single random level in the period 2 reducing the estimated random variability? If so - is there a better way of modeling these data?

toy <- structure(list(Period = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("Period1", "Period2"), class = "factor"), 
Value = c(5332.81481481481, 2711.11111111111, 1600, 622.222222222222, 
1198, 2190, 2043, 2345, 1957, 1138, 474, 1328, 1647, 1155, 
1672, 2000, 5113, 2018, 2650, 9512, 793, 50, 550, 3900, 3215, 
4900, 5075, 3000, 6100, 8000, 4000), YearFac = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L
), .Label = c("2006", "2013", "2017", "2018", "2019"), class = "factor")), 
row.names = c(NA, 
-31L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Period", 
"Value", "YearFac"))

Analysis

library(plyr)
library(nlme)
library(ggplot2)

mod <- lme(Value ~ Period, random = ~1|YearFac, data = toy, method = "ML")

toy$Pred <- predict(mod, level = 0)
toy$Pred.random <- predict(mod)

means <- ddply(toy, .(Period, YearFac), summarize, Mean = mean(Value))

ggplot(toy) + 
geom_point(aes(x = Period, y = Value, colour = YearFac), 
    position = position_dodge(width = 0.4)) + 
geom_point(aes(x = Period, y = Pred, shape = "Pop-level prediction"), size = 3) + 
geom_point(aes(x = Period, y = Pred.random, shape = "Random prediction", group = YearFac), 
    position = position_dodge(width = 0.4), size = 2) +
geom_point(data = means, aes(x = Period, y = Mean, 
    group = YearFac, fill = "mean"), shape = 21, size = 2,
    position = position_dodge(width = 0.4)) +
theme_bw()
user2602640
  • 235
  • 1
  • 9
  • 1) Your years are nested in period, which is not in your model 2) The fact that you have only one year in period 2 is a problem, and 3) There should be less variability in the random intercepts than in the means: https://stats.stackexchange.com/questions/32522/why-are-random-effects-shrunk-towards-0 – mkt Dec 08 '17 at 16:39
  • Since the years are unique, I was under the impression that explicit nesting isn't required (in fact, it'll mess up the model). I need to understand how much of a problem the having the one year poses – user2602640 Dec 08 '17 at 16:40
  • It is precisely because they are unique that I thought they should be nested. But I'm not sure. Here's some relevant discussion: https://stats.stackexchange.com/questions/228800/crossed-vs-nested-random-effects-how-do-they-differ-and-how-are-they-specified – mkt Dec 08 '17 at 16:43
  • Not to my understanding. From the post you linked to: `"The cross tabulation shows that each level of class occurs only in one level of school, as per your definition of nesting. Both model formulations will now produce the same output"` – user2602640 Dec 08 '17 at 16:55

0 Answers0