Does using difference transformation lead to bias? (Levels vs differences regression)

Question

Consider the model estimated in levels (also assume this is the true population model):

$$y_t = x_t\beta + e_t$$

As usual we have the dependent variable $y$, independent $x$, the error term $e$, and the to be estimated parameter $\beta$.

Take away $y_{t-1}$ from both sides of the equation and substitute the second term of the first equation to the righ hand side to obtain regression in differences:

$$y_t - y_t = x_t \beta - x_{t-1} \beta +e_t - e_{t-1}$$ $$ \Delta y_t = \Delta x_t \beta + \Delta e_t$$

Since the only difference in the models is the autocorrelation of the error term, you would have to assume that the transformation doesn't generate bias. However, one of the estimators is more efficient, because of the different autocorrelation structure of the error term. From this we can conclude that the betas are exactly the same, only reason why one should be different from the other asymptotically is bias.

Yet, the betas are supposed to differ from one another and it is said that by estimating the difference regression you can't imply levels differences. (This is one motivation for (V)ECMs; however, omitted variable bias could be another motivation.)

My question is, what is the actual relationship between the betas of the levels and differences regression? Is the divergence of the parameter estimates in fact only a result of breaking other Gauss-Markov assumptions (than the autocorrelation)? For example, if the variables have a relationship, but also behave like I(1) processes, then levels regression obviously has bias.

Of course, it's problematic for panel first-differences models, too, if only short-run dynamics can be implied from differenced variables. What is the truth of the matter?

For example, in this video the argument is made that the estimates of beta won't be the same in the two equations. The results are even simulated and I can't see any faults in the simulation, although the betas are not presented.

Does the FD regression have an intercept? If so, then you also should have a linear trend in the levels model. Sometime people forget about this and wonder why the simulation estimates differ. — dimitriy, Apr 21 '16 at 20:59

score 2 · Accepted Answer · edited Apr 25 '16 at 23:08

And here is the answer, via simulation. Perhaps someone can come up with a mathematical proof to boot.

The R code:

get_df <- function(n_obs=10^3, true_beta=c(5, -1)) {
    stopifnot(length(true_beta) == 2)  # Coefficients on constant, x1, x2
    df <- data.frame(x1=1:(n_obs), epsilon=rnorm(n_obs), constant=1)
    df$y <- as.matrix(df[, c("constant", "x1")], n_obs, 2) %*%
             true_beta + df$epsilon
    df<-as.data.frame(diff(as.matrix(df)))
    return(df)
}

get_beta_hat <- function(df, formula=y ~ x1 - 1) {
    fit <- lm(formula, data=df)
    return(coefficients(fit))
}

get_beta_hat(get_df(10000))

The beta of the differened data will converge to the true population beta. In addition the expected value is also the same (I will have to tidy up the code for that, while the code is too messy to be shared here, at least it gave a proper result). The code is based on an earlier answer by Adrian, since I found it very effective. The forecast will be off for the differenced model though.

Does using difference transformation lead to bias? (Levels vs differences regression)

1 Answers1