4

I'm looking for help with regard to the notation for a regression equation in a repeated measures model with nested data, $\eqref{eq:2}$, and connecting the notation back to my model specification, Model.3, in .

My starting point is, i.e. I am familiar, the notation used in Wooldridge's Introductory (2013). Wooldridge's notation for the general longitudinal model for both Random Effects and Fixed Effects Estimation is can be written as,

$$ y_{it} = \beta_{0}+\beta_{1}x_{it} + a_{i}+u_{it} \tag{1} \label{eq:1} $$ where individuals are indexed by $i = 1, 2, …, n$ and time is indexed by $t = 1,2, …, T$. The error term is in two parts; $a_i$ an unobserved individual specific component, which captures unobserved, time-constant, factors and $u_{it}$ the idiosyncratic error, capturing unobserved factors that change over time.

In I've been estimating a Random Effects version of this model using the package like this. First some required packages and some data,

# install.packages(c("plm", "lme4", "texreg", "mlmRev"), dependencies = TRUE)
data(egsingle, package = "mlmRev")

the data-set egsingle is a unbalanced panel consisting of 1721 school children, grouped in 60 schools, across five time points. For details see ?mlmRev::egsingle

Some light data management

dta <- egsingle
dta$Female <- with(dta, ifelse(female == 'Female', 1, 0))

Also, a snippet of the relevant data

dta[118:127,c('schoolid','childid','math','year','size','Female')]
#>     schoolid   childid   math year size Female
#> 118     2040 289970511 -1.830 -1.5  502      1
#> 119     2040 289970511 -1.185 -0.5  502      1
#> 120     2040 289970511  0.852  0.5  502      1
#> 121     2040 289970511  0.573  1.5  502      1
#> 122     2040 289970511  1.736  2.5  502      1
#> 123     2040 292772811 -3.144 -1.5  502      0
#> 124     2040 292772811 -2.097 -0.5  502      0
#> 125     2040 292772811 -0.316  0.5  502      0
#> 126     2040 293550291 -2.097 -1.5  502      0
#> 127     2040 293550291 -1.314 -0.5  502      0

Now, here’s how I would specify the Random Effects Model in , ignoring the schoolid, based on $\eqref{eq:1}$, using plm() and estimating with FGLS,

library(plm)
Model.1 <- plm(math~Female+size+year, dta, index = c("childid", "year"), model="random")
# summary(reg.re.plm)

However, as mentioned at the top, the data is also nested. That is, childid is nested in schoolid. To write this regression equation I've simply extended $\eqref{eq:1}$ by adding a school-subscript, $s$,

$$ y_{ist} = \beta_{0}+\beta_{1}x_{ist} + a_{i}+\nu_{s}+u_{ist} \tag{2} \label{eq:2} $$ now $y$, $x$, and the idiosyncratic error, $u$, is extended with a $s$ dimension, and the combined error, that in $\eqref{eq:1}$ consist of two parts, is in $\eqref{eq:2}$ extended by a term, $\nu_{s}$. This term then captures the unobserved group/school specific component. I am not confident that this specification is correct. I might be confused by the differences in jargon across the literature.

Part 1 Is $\eqref{eq:2}$ a correct way to specify a regression equation for repeated measures random effects model with a nested structure? Any authoritative literature that use notation similar to this?

This next part, Part 2, is no longer that relevant.

I have tried finding a way to estimate what I believe is $\eqref{eq:2}$ using , but I haven't succeeded in that. Part 2 Is it possible to estimate a repeated measures random effects model with a nested structure using the package? Based on this question I believe this part is answered by a yes it is estimate to estimate a _repeated measures random effects model with a nested structure_ using the package, see the the question linked above

I have estimated, after studding this great answer by Robert Long, a repeated measures model, with childid nested in schoolid, using the package. Like this,

dta$year <- as.factor(dta$year) 
require(lme4)

As the package is relying on a likelihood framework I begin by estimating a model similar to Model.1 above (for later comparison). Like this,

Model.2 <- lmer(math ~ Female + size + year + (1 | childid), dta)

Now, relying on Robert Long's answer I've specified the nested model like this,

Model.3 <- lmer(math~Female+size+year+(1| schoolid /childid), dta)

Assuming Model.3 is correct specified.

Part 3.a What authoritative source do you recommend, preferably with notation similar to Wooldridge (2013), that presents and discuss the notation for the regression equations for what I am estimating in Model.3?

Part 3.b Is $\eqref{eq:2}$ actually what I am estimating in Model.3?

Below is the actual estimation results form the three models,

# require(texreg)
texreg::screenreg(list(Model.1, Model.2, Model.3), digits = 3)    
#> =============================================================================
#>                                    Model 1       Model 2        Model 3      
#> -----------------------------------------------------------------------------
#> (Intercept)                          -2.671 ***     -2.669 ***     -2.693 ***
#>                                      (0.085)        (0.086)        (0.152)   
#> Female                               -0.025         -0.025          0.008    
#>                                      (0.046)        (0.047)        (0.042)   
#> size                                 -0.000 ***     -0.000 ***     -0.000    
#>                                      (0.000)        (0.000)        (0.000)   
#> year-1.5                              0.878 ***      0.876 ***      0.866 ***
#>                                      (0.059)        (0.059)        (0.059)   
#> year-0.5                              1.882 ***      1.880 ***      1.870 ***
#>                                      (0.059)        (0.058)        (0.058)   
#> year0.5                               2.575 ***      2.574 ***      2.562 ***
#>                                      (0.059)        (0.059)        (0.059)   
#> year1.5                               3.149 ***      3.147 ***      3.133 ***
#>                                      (0.060)        (0.059)        (0.059)   
#> year2.5                               3.956 ***      3.954 ***      3.939 ***
#>                                      (0.060)        (0.060)        (0.060)   
#> -----------------------------------------------------------------------------
#> R^2                                   0.735                                  
#> Adj. R^2                              0.735                                  
#> Num. obs.                          7230           7230           7230        
#> AIC                                              16855.629      16590.715    
#> BIC                                              16924.489      16666.461    
#> Log Likelihood                                   -8417.815      -8284.357    
#> Num. groups: childid                              1721                       
#> Var: childid (Intercept)                             0.857                   
#> Var: Residual                                        0.334          0.334    
#> Num. groups: childid:schoolid                                    1721        
#> Num. groups: schoolid                                              60        
#> Var: childid:schoolid (Intercept)                                   0.672    
#> Var: schoolid (Intercept)                                           0.180    
#> =============================================================================
#> *** p < 0.001, ** p < 0.01, * p < 0.05

Wooldridge, Jeffrey M. (2013). Introductory Econometrics: A Modern Approach. 5th edition. South-Western College, 2013. isbn: 9781285414645. url: https://www.cengage.co.uk/books/9781111531041/

Eric Fail
  • 439
  • 1
  • 5
  • 18
  • Yes, your (2) is what you are estimating with your model 3, apart from the fact that you have `Female + size` in your models but this is missing from your formulas (1) and (2). – amoeba Mar 16 '18 at 17:15
  • @amoeba, thank you for your prompt response. That was surprisingly straightforward. I imaged _x_ being a vector of all the independent variables, i.e. `Female` and `size`. Do you happen to know if (2) can be estimated using `plm` from the `plm` package? – Eric Fail Mar 16 '18 at 17:40
  • Don't know about `plm`. – amoeba Mar 16 '18 at 17:50
  • FWIW for multiple independent variables you'd have to have $\beta_1$ be a vector as well (transposed/oriented so that $\beta_1 x_{ist}$ is a dot product). AFAICT eq. (2) and `Model 3` are indeed the same. Is that your entire question ...? – Ben Bolker Mar 17 '18 at 19:52
  • @BenBolker, thank you for your comment! I guess, as most of what I wrote seems to be close to correct, that your comment, along with `amoeba`’s comment, answers almost _the entire question_. I am still looking for a textbook presentation of (2), preferably something that follows the notation style used above - or what source you would recommend? cf. **Part 3.a** above. A source that covers your point about $\beta_1$, more about how what is transposed, what assumptions change, what misspecification test is recommended, and related. Again, I very much appreciate you take the time to comment! – Eric Fail Mar 17 '18 at 20:36
  • 2
    can't really help you with sources, since pretty much all of the sources I follow (Gelman and Hill *Applied Regression Modeling*, Pinheiro and Bates, Bates et al. *J Stat Software* ...) use the multilevel-style rather than the econometrics-style notation. As for things like misspecification testing -- I think you'd better off asking more focused questions ... – Ben Bolker Mar 17 '18 at 20:50
  • @BenBolker, thanks a lot. I'll keep working on it and post a more focused question down the road. Again, thank you for your time and consideration. – Eric Fail Mar 20 '18 at 11:02

0 Answers0