5

I have a dataset that has measurements of resource consumption in buildings for a number of years. I am interested in the differences in resource consumption of buildings in my study area between years (as opposed to differences between individual buildings). I've fitted a Linear Mixed Model to my data with the lme4 package in R using the formula: model = lmer(resource.consumption ~ year + (1|building.id))

I would like to put this into a formula or equation format that will allow those unfamiliar with R to be able to understand what is being estimated by this model. However, I am having some trouble figuring out how to go about this given that 'year' is a factor in this scenario. The summary() function gives the following output:

Linear mixed model fit by REML ['lmerMod']
Formula: resource.consumption. ~ year + (1 | building.id)
   Data: year.comp

REML criterion at convergence: 122.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.1312 -0.4170 -0.0711  0.3419  5.0172 

Random effects:
 Groups      Name        Variance Std.Dev.
 building.id (Intercept) 0.07294  0.2701  
 Residual                0.04537  0.2130  
Number of obs: 368, groups:  building.id, 107

Fixed effects:
               Estimate Std. Error t value
(Intercept)     1.32746    0.05565  23.855
year2007       -0.24504    0.06029  -4.064
year2008       -0.36634    0.05817  -6.298
year2009       -0.44730    0.05551  -8.057
year2010       -0.47449    0.05391  -8.801
year2011       -0.53752    0.05524  -9.730

Correlation of Fixed Effects:
            (Intr) i.2007 i.2008 i.2009 i.2010
yr2007      -0.696                            
yr2008      -0.710  0.657                     
yr2009      -0.775  0.697  0.714              
yr2010      -0.803  0.720  0.735  0.802       
yr2011      -0.801  0.704  0.722  0.800  0.825

From here and here I think I've narrowed my options to the following (you'll have to excuse these, they'll be messy but hopefully readable): $$y_{im} = \beta_0 + \beta_1 year_{im} + b_{0m} +\epsilon_{im} $$ where $i$ is the # of obs., and $m$ is the grouping variable (building.id in this case)

OR

$$ y_{imj} = \beta_0 + \Sigma\beta_{1m}[year]_{im} + b_{0j}[building.id]_j + \epsilon_{imj} $$ where $i$ is the # of obs., $m$ corresponds to year, and $j$ corresponds to building.id.

Are either of these correct? Any help would be hugely appreciated!

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Robin_H
  • 53
  • 3

1 Answers1

4

Your second formulation,

$$ y_{imj} = \beta_0 + \sum\beta_{1m}[year]_{im} + b_{0j}[building.id]_j + \epsilon_{imj}, $$

is correct. Depending on your audience, it might be slightly clearer to use the slightly more general mixed model notation and write this as

$$ \begin{split} y_i & \sim N(\eta_i,\sigma^2) \\ \eta_{imj} & = \beta_0 + \beta_{1,m(i)} + b_{j(i)} \\ b_j & \sim N(0,\sigma^2_b) \end{split} $$ where $m(i)$ gives the year and $j(i)$ gives the building corresponding to the $i^{\textrm{th}}$ observation.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
  • Thanks! I'm fairly new to mixed models so the general notation is new to me but it seems like it would be simpler to follow for my audience. Thanks again. – Robin_H Aug 31 '15 at 03:04