1

I am conducting a regression where in I have data at the quarterly level for 19 companies (I have data ranging from 2007-2019 so about 30-50 quarters for each company). My regression model in STATA is as follows:

DV (in quarter t+1) = constant + IV (in quarter t) + Controls (in quarter t) + Lag DV (i.e DV in quarter t) + error

IV stands for independent variable and DV for dependent variable. The lagged DV is just a control variable and not my main variable of interest. The main variable of interest is the IV (in quarter t). I run the above using quarter and firm fixed effects and robust standard errors.

Question - does the inclusion of lagged DV bias all coefficients or just the coefficient on lagged DV? I know I should control for some sort of autocorrelation but how can I do that (eg. using prais command?). Is there anything else I can do to test the robustness of my results.

Any help is appreciated

  • What are your assumptions about $e_t | X$, where $e_t$ is error and $X$ is vector of IVs? – Dayne Oct 21 '20 at 18:15
  • The assumption is that they are uncorrelated - else I would have an omitted variable bias. Plus I have no reason to believe that the error term would be correlated with any of the independent variables - but since I have a lag of DV now, there would be an AR(1) correlation between the error terms – Aishwarya Deore Oct 21 '20 at 20:46
  • But if you have already included AR(1) term in the model why would errors have AR(1) correlation? – Dayne Oct 22 '20 at 01:29

1 Answers1

0

Not a complete answer but maybe we can analyze this by partitioned regression:

$$\mathbf{y_t}=\mathbf{X}\beta+\phi \mathbf{y_{t-1}}+\mathbf{e_t}$$

OLS estimate, $\hat{\beta}$ would be:

$$\hat{\beta}=(\mathbf{X'X})^{-1}\mathbf{X'}(\mathbf{y_t-\hat{\phi}y_{t-1}})=(\mathbf{X'X})^{-1}\mathbf{X'}(\mathbf{X}\beta+(\phi-\hat{\phi})\mathbf{y_{t-1}+e_t})$$

Therefore,

\begin{align} E(\hat{\beta})&=\beta+\phi (\mathbf{X'X})^{-1}\mathbf{X'}\big(E(\mathbf{y_{t-1}})-E(\hat{\phi}\mathbf{y_{t-1}})\big) \\&=\beta + \beta\frac{\phi}{1-\phi}+(\mathbf{X'X})^{-1}\mathbf{X'}E(\hat{\phi}\mathbf{y_{t-1}}) \\&=\frac{\beta}{1-\phi}+ (\mathbf{X'X})^{-1}\mathbf{X'}E(\hat{\phi}\mathbf{y_{t-1}}) \end{align}

I am unable to show that the term $E(\hat{\phi}\mathbf{y_{t-1}})$ would not be $0$, but I think because $\hat{\phi}$ is a function of $y_t$, there will be a lingering variance term in this and so it would not be $0$. If correct, then it seems the other parameter estimates are also biased.

(I tried searching for OLS estimation and inference of ARIMAX models but couldn't find anything; also the above results are based on simple OLS and not for robust standard errors)

Dayne
  • 2,113
  • 1
  • 7
  • 24
  • 1
    Thank you so much for this. I do read elsewhere that inclusion of lagged DV can bias the coefficients of other DVs. My expectation is for the coefficient of the main X to be negative. My coeffs are -.03 and -0.05 when I do and don't control for lagged DV respectively. Both are significant. If the bias is downward (i.e. smaller magnitude like -0.03) it is still okay for me since I am interested in the sign and significance and not the magnitude – Aishwarya Deore Oct 22 '20 at 13:19
  • Actually, I hope someone else gives a thorough solution. From what I have done, I am unable to conclude whether it will be biased or not. – Dayne Oct 22 '20 at 13:35
  • 1
    See this link: https://stats.stackexchange.com/questions/52458/inclusion-of-lagged-dependent-variable-in-regression – Aishwarya Deore Oct 22 '20 at 13:37
  • Thanks for the reference. So the coefficients do get biased if estimated using OLS. – Dayne Oct 22 '20 at 13:42
  • Yes seems so - but if you look at section 4.4 in the Keele Kelly paper (Dynamic Models for Dynamic Theories: The Ins and Outs of Lagged Dependent Variables), it seems like asymptotically the estimates are unbiased/ the bias is very small. Importantly, if there is a correlation between DV and lag DV, an OLS with LDV is the best. One cannot omit the LDV – Aishwarya Deore Oct 22 '20 at 14:49
  • More importantly - the coeffs get downward biased and can become insignificant. However, in my case, they remain significant and since I am only interested in prediction direction and not commenting much on the magnitude, I don't think using OLS should be a problem – Aishwarya Deore Oct 22 '20 at 14:51
  • Yes if the lag DV is negatively correlated with DV, then the bias would be upward. However, in my case the lag DV is positively correlated with DV so the bias should be downward. However, my prediction is for a negative association between the IV and DV. Does that mean that the bias is upward in my case (more negative is an overestimation) ? However, my negative coefficient reduces when using a lag DV – Aishwarya Deore Oct 22 '20 at 15:10
  • I think I misread. Based on Keele & Kelly paper, $\hat{\beta}$ may actually be unbiased. The model in equation (10) assumes $X$ variable(s) also to be autocorrelated - which I don't think is the case for you (at least I didn't assume it). If you put $\rho_1=0$ in equation (13), it would mean $\hat{\beta}$ is unbiased. Isn't it? – Dayne Oct 22 '20 at 15:10
  • Yes that is true if X's are assumed to be uncorrelated. However, if they are positively correlated, the bias would be downward correct? And what would this mean is the expectation for the Beta is negative to begin with? But anyway - according to the paper, even in this case, the bias is only 6% and drops to 3% when the number of observations increase correct? – Aishwarya Deore Oct 22 '20 at 15:21
  • So the original paper from which Keele & Kelly seem to have borrowed is [this](http://www-personal.umich.edu/~franzese/Achen.2000.LDVstealingExplanPower.pdf). Based on this (see eq. 9), you need both $\rho_1, \rho_2 > 0$ for $\hat{\beta}$ to be biased. If $\beta$ is actually negative then there will be upward bias. In your case since $\hat{\beta}<0$ and significant, you can be confident that $\beta<0$. But interesting thing is that unless $\rho_1, \rho_2 > 0$ your estimates are actually unbiased. – Dayne Oct 22 '20 at 15:27
  • What happens when fixed effects are included in the model? I know there is literature saying not to use lagged DV when using random effects model but what about fixed effects? – Aishwarya Deore Oct 27 '20 at 01:43
  • Not sure about that. But I don't see any reason to be concerned about it. Fixed effect is just like intercept, it is time invariant and assumed to be same across companies. Unless you are subtracting company-wise means from equation, it's going to be part of the intercept estimate. Isn't it? – Dayne Oct 28 '20 at 23:44