0

I want to regress my dependent variable on my independent variables in R. First for the level of the variables: lm(y~x+z+u). Now since my variables are non-stationary I have to take the first difference of each variable. My question is, is it right to do the following: lm(diff(y)~diff(x)+diff(z)+diff(u))?

My question arises because I read the question and the corresponding answer from this thread: How do I interpret my regression with first differenced variables? What baffled me was the answer from Charlie. Is taking the difference from each variable the same as subtracting $y_{t-1}$ from each side of the model in levels? I.e. is subtracting $y_{t-1}$ from lm(y~x+z+u) equal to lm(diff(y)~diff(x)+diff(z)+diff(u))? I already posted this question on Stack Overflow, but since it has nothing to do with programming/coding I deleted it an reposted it on this forum.

Michael B
  • 172
  • 1
  • 12
  • many thanks. could you please explain your formula lm(diff(y)~x[-1]+z[-1]+u[-1]+offset(1*tail(y,-1))) in words? – Michael B Apr 05 '15 at 17:06
  • `lm(diff(y)~-1+x[-1]+z[-1]+u[-1]+offset(1*tail(y,-1)))`) is equivalent to running a regression of the form $\Delta y_t=\beta_1 x_t+\beta_2 z_t+\beta_3 \Delta u_t-1 \cdot y_{t−1}+\varepsilon_t$. You may look up functions `diff`, `tail` and `offset` separately in R help files. `[-1]` and `tail(y,-1)` are used to effectively create non-lagged and lagged variables. – Richard Hardy Apr 09 '15 at 18:10
  • As you perhaps noticed, I turned my comments into an answer and fixed a few typos and omissions in the formulas. Nothing substantively new was added. – Richard Hardy Apr 09 '15 at 18:11

1 Answers1

1

Using lm(diff(y)~-1+diff(x)+diff(z)+diff(u)) (note the -1 to remove intercept) is perfectly fine (the answer by Charlie is not at odds with this approach) -- unless your variables are cointegrated. If they are, a vector error correction model (VECM) would be more appropriate.

Since $y_{t−1}$ is not exactly equal a linear combination of $x_{t−1}$, $z_{t−1}$ and $u_{t−1}$ (because there is an error term, too!),

lm(diff(y)~-1+diff(x)+diff(z)+diff(u))
(equivalently $\Delta y_t=\beta_1 \Delta x_t+\beta_2 \Delta z_t+\beta_3 \Delta u_t+\varepsilon_t$)

will not give exactly the same numerical results as

lm(diff(y)~-1+x[-1]+z[-1]+u[-1]+offset(1*tail(y,-1))))
(equivalently $\Delta y_t=\beta_1 x_t+\beta_2 z_t+\beta_3 \Delta u_t-1 \cdot y_{t−1}+\varepsilon_t$)

But the two should be pretty close.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • Thanks for your further explanation. But why do you remove the intercept? – Michael B Apr 09 '15 at 18:43
  • Because first you have a relationship $y_t=\beta_0+\beta_1 x_t+\beta_2 x_t+\beta_3 u_t+\varepsilon_t$ as inferred from your OP: `lm(y~x+z+u)`. When you difference both sides of this equation, the intercept cancels out and disappears. – Richard Hardy Apr 09 '15 at 19:07
  • Thanks, but thats exactly what confuses me. Why should the intercept drop out in this case? Isnt that only the case if you subtract Y(t-1) from both sides? (How do I type mathematical formulas on this forum?) – Michael B Apr 09 '15 at 19:59
  • 1
    For a short example, consider $y_t=\beta_0+\beta_1 x_t+\varepsilon_t$. Then $y_{t-1}=\beta_0+\beta_1 x_{t-1}+\varepsilon_{t-1}$. Subtract the latter equation from the former one to get $\Delta y_t=\beta_1 \Delta x_t+\Delta \varepsilon_t$ and replace $\Delta \varepsilon_t$ with $u_t$ to make it look nicer. Regarding mathematical formulas: open any post with formulas, click "edit" and see how they were produced. Alternatively, search for "MathJax", "Latex" or similar. – Richard Hardy Apr 09 '15 at 20:07