I am considering the following model: $$y_{it}=\rho y_{it-1}+\beta x_{it}+\delta_i+\delta_{rt}+\epsilon_{it}.\tag{1}$$ Here $y$ is an outcome variable, $x$ our variable of interest, $\delta_i$ denotes subregion fixed effects and $\delta_{rt}$ region $\times$ year fixed effects. (For example, $i$ denotes a grid cell in country $r$: see http://grid.prio.org/). Models such as the one above are suggested in [1] at p. 745.
My question is simple, but there seems to be no answer to it: How do I acquire a consistent estimate of $\beta$ in the model above (for serially uncorrelated or correlated error terms), theoretically and practically?
The usual approach is to apply the difference operator $\Delta$, giving us the model $$\Delta y_{it}=\rho\Delta y_{it-1}+\beta\Delta x_{it}+\eta_{rt}+\Delta\epsilon_{it},\tag{2}$$ where $\eta_{rt}\equiv\Delta\delta_{rt}$. Notice that $\delta_i$ vanishes. The problem with the original model is that $\epsilon_{rt}$ is correlated with $y_{rt}$, and thus strict exogeneity does not hold and pooled OLS would give us biased estimates. In this new, differenced model, $\Delta\epsilon_{it}$ is however correlated with $\Delta y_{it-1}$. To solve this one usually instruments $\Delta y_{it-1}$ with $y_{it-2}$, which is uncorrelated with $\Delta\epsilon_{it}$ assuming no serial correlation in the error term; this leads to the Anderson-Hsiao estimator. Another approach is to use the Arrelano-Bond estimator. $\eta_{rt}$ can be modelled as a linear combination of dummies.
Is this the correct way to theoretically estimate $\beta$? Or do I also need to instrument $x_{it}$ or $\eta_{rt}$ to acquire a consistent estimate of $\beta$? I have seen no formal treatment of the case with multiway fixed effects, $\delta_{rt}$. Also, there seems to be no command in Stata allowing me to account for a large set of dummies (I tried to include a dummy for each $(r,t)$, but I have 55 countries over 14 years, resulting in $770-55$ dummies). I have tried to use the $\texttt{xtdpdsys}$ command in Stata, but it cannot account for $\eta_{rt}$ from what I can see, and $\texttt{reghdfe}$ using $y_{it-2}$ as an instrument (see above), but it gives me a negative $R^2$, suggesting a very bad fit and that my idea to absorb region $\times$ year fixed effects does not work.
Idea
One thought I had was to use the ideas in [2]. There may be a varying number of subregions in each region, but we may average as follows: Let $\Delta \bar{y}_{\cdot rt}=\frac{1}{N_r}\sum_{i\text{ in region }r}\Delta y_{irt}$, where $N_r$ is the number of subregions in region $r$. This is done for all variables in (2). Subtracting $\Delta \bar{y}_{\cdot rt}$ from both sides of (2) then gives us $$(\Delta y_{it}-\Delta \bar{y}_{\cdot rt})=\rho(\Delta y_{it}-\Delta \bar{y}_{\cdot rt})+\beta(\Delta x_{it}-\Delta \bar{x}_{\cdot rt})+(\Delta \epsilon_{it}-\Delta \bar{\epsilon}_{\cdot rt}).$$ I do not know if the arguments in [2] can be directly transferred to this case, as I have different number of subregions $i$ in different regions $r$.
Notes
Suggested pseudo-Stata code using the Arellano-Bond estimator with xtdpdsys:
xtdpdsys y x rtdummies*, nocons lag(1)
Suggested pseudo-Stata code using the Anderson-Hsiao estimator with reghdfe:
reghdfe diff(y) x (diff(lag(y)) = lag(lag(y))), absorb(rt)
References
1 Dell, M., Jones, B. F., & Olken, B. A. (2014). What do we learn from the weather? The new climate–economy literature. Journal of Economic Literature, 52(3), 740-798.
2 Balazsi, L., Matyas, L., & Wansbeek, T. (2015). The estimation of multidimensional fixed effects panel data models. Econometric Reviews, 1-23.