What's the relationship between cointegration and linear regression?

Question

If two non-stationary processes are cointegrated, that means a linear combination of the two processes are stationary. In a simple linear regression, we have the model form:

$y = b_0 + b_1x + e$

If we re-arrange, we can have something like

$(y - b_1x) = b_0 + e$

And thus, the linear combination of y and x are stationary with mean b0 and variance $\sigma^2$. If y and x are stock prices, then $b_1$ is the hedge ratio.

So what are the similarities and differences of cointegration and simple linear regression? I am not seeing the big picture for cointegration yet and why it is useful. The typical example of cointegration has to do with stock prices. Why not just take any two stocks prices, run a linear regression between them, check the residuals and make sure it passes the typical SLR assumptions? Basically the residuals show stationarity. And thus we can use typical regression methods as opposed to an entirely new suite of cointegration tests and methods.

I think a similar question might have been asked before, you may benefit from checking that. — Richard Hardy, Jun 27 '20 at 18:46
Before 1987, if two series were I(1), econometricians knew that there was a problem with regression so they differenced them and regressed the differenced series. This was an improvement but still the level information in the respective series was lost. In 1987, engle and granger figured out that regressing them could be okay if certain other assumptions were met. This allowed levels and differences info to remain ( if cointegrating relationship is written as an ecm ). So, it may not sound like a big deal now but, if you had figured that out in 1986, you may have gotten the nobel prize. — mlofton, Jun 27 '20 at 20:18
@RichardHardy I think I found the one you are referring to, thanks! https://stats.stackexchange.com/questions/107468/cointegration-same-thing-as-stationary-residuals?rq=1 — confused, Jun 27 '20 at 22:01
@mlofton Ok so that makes sense why cointegration is reserved to such a small part in most texts, since standing today and looking back, it's not a big deal but back then it was a big discovery. So cointegration and SLR are basically the same thing as long as you can prove via tests that the residuals of your SLR are stationary. Maybe one difference is we don't have to assume normality, i.e. the "difference" between the two series is not assumed to follow any distribution when we do a cointegration test - I'll have to double check this. — confused, Jun 27 '20 at 22:06
@confused : what you said is not wrong but maybe somewhat over-simplifying. Once you re-write the cointegrating regression as an ecm, interesting things come from that. A good book for ecm discussion is Banerjee et al. Then there's a whole different ball of wax when you look at more than two series. Johansen and Juselius developed the machinery for that and it's no joke. Johansen's book is the bible for that. Juselius maybe be a sub-author but she also has her own books. It's difficult material and needs a "geometric-linear algebraic" mind. I'm still looking for a store where I can buy one. — mlofton, Jun 28 '20 at 00:32
Oh. One thing I left out is your discussion near the end. Note that DF-test probably requires normality of error term in order to obtain the DF statistics because they have pretty strange distributions even under NULL. — mlofton, Jun 28 '20 at 00:35
Ok thanks. I have never really seen ECMs or VECMs or VAR so I'll have to look into it. — confused, Jun 29 '20 at 04:21
@mlofton, as far as I remember, normality is not needed in the DF test, at least asymptotically. In small samples it might be otherwise, though. — Richard Hardy, Aug 07 '20 at 08:40
Hi Richard: The DF statistic itself is not normally distributed which is why you use the Dickey Fuller tables. But the derivation of the distribution of the Dickey Fuller test statistic uses the fact that the error term in the regression is normally distributed. See their original paper for details. I think it's JASA, late 70's. — mlofton, Aug 07 '20 at 13:26
Hi Richard: I was a few years off and I don't know if it's easily available on the net. https://www.tandfonline.com/doi/abs/10.1080/01621459.1979.10482531 — mlofton, Aug 07 '20 at 13:29
@mlofton, hmm, interesting. If this is the case, the test becomes of pretty limited use. I find this surprising, but what do I know :) — Richard Hardy, Aug 07 '20 at 15:31
Hi: well, you need to assume some distribution for the error term so I don't think normality of error term is a worse choice than some other assumption. In fact, one could argue that it's a better choice. — mlofton, Aug 08 '20 at 10:21
@mlofton, the assumption could be far less specific. E.g. the OLS estimator in a linear regression works fine under nonnormality, and the assumptions delimitating the set of distributions under which OLS has its nice properties are little restrictive. I was hoping for something similar here, too. — Richard Hardy, Aug 08 '20 at 13:17
@Richard Hardy: I see what you're saying but note that, in the case of DF, it's hypothesis-testing/inference that's being done rather than estimation. Since, IIRC, they obtain the distribution of the DF statistic through simulation, they need an error term assumption. It's possible that one could obtain different DF tables ( by simulation ) assuming some other distribution but I don't know if anyone has done work on that. — mlofton, Aug 09 '20 at 14:44
@mlofton, that is a good point. But I think I found a relevant thread ["Can Dickey-Fuller be used if the residuals are non-normal?"](https://stats.stackexchange.com/questions/250505) showing that the normality assumption is not needed as the argument is asymptotic (similarly to how normality is not needed for OLS as the estimators are asymptotically normal regardless, under mild assumptions). Does that make sense? — Richard Hardy, Aug 09 '20 at 19:19
Richard Hardy: Phillips actually DERIVES the distribution of the dickey-fuller test statistics. Dickey and Fuller simulated under the null in order to obtain asymptotic DF test statistics. I'm pretty certain that Dickey and Fuller simulations assumed normality but I'll have to read their paper again to confirm or un-confirm. (problem is getting my hands on it). Note though that Phillips' result is different and more rigorous than Dickey and Fuller's because they obtained their table via simulation. IRRC, Phliips obtained it analytically. — mlofton, Aug 10 '20 at 20:42
@mlofton, yes, what I am trying to say is not how Dickey & Fuller obtained their result (you might very well be right about them) but that their test can be safely used without assuming normality. I think this is an important takeaway for practitioners. — Richard Hardy, Aug 11 '20 at 08:38
Hi Richard: I agree that it's an important issue but I still believe that you need normality. I couldn't get the paper but Kerry Patterson has a really nice book titled "An Introduction To Applied Econometrics". Chapter 6 is on non-stationarity of univariate time series. He doesn't come out and say it EXPLICITLY but, if you read that chapter, I think you'll agree with me. If you can't get your hands on that text, we can agree to disagree :). — mlofton, Aug 11 '20 at 21:14
@mlofton, I would rather learn the truth than win an argument. Do you think the answer I cited is incorrect? Or does it not apply? If Dickey and Fuller simulated something that can be derived analytically, that does not mean their results are wrong. Nor does it mean the analytical results are wrong. If the two papers discover the same thing though describe it from different perspectives, how does that invalidate any one of them? I cannot really grasp where your objection comes from. — Richard Hardy, Aug 12 '20 at 11:25
I think you're talking about the stackexchange link you pointed to ? If so, that link references what Phillips did and what he did and assumed is very different from what DF did and it's DEFINITELY NOT WRONG. But I thought we were referring to the DF tables created by Dickey and Fuller ? If not, let me know. In the mean-time, I'll try to obtain the DF paper from JASA. Just to be clear, my argument that you need normality of the error term is referring to the need for normality of the error term when one uses the tables generated by DF. That has very little to do with the Phillips link. — mlofton, Aug 13 '20 at 14:12
@mlofton, I got some feedback from Christoph Hanck, and they have clarified the matter quite a bit. Check out the new comments in the thread I have linked to above. And thank you for a helpful discussion! — Richard Hardy, Aug 13 '20 at 14:18
Hi Richard: I'll check out the new comments but can you read the statement right under the table on page 13 of this link. I think that says it pretty clearly. Think of it this way: The DF tables use sample sizes as low as 20 or 25. It's not going to be possible to use an error term with a different distribution and get the same test-statistic. http://pages.stern.nyu.edu/~wgreene/Text/Edition7/Manuscript/Greene_Fin_ms_CH23-checked.pdf. — mlofton, Aug 13 '20 at 14:22
Hi Richard: I read the link and Christopher is saying that there is no controversy because what Phillips did is very different from what DF did. AFAIK, Phillips result is not used in practice. I hope this clarified the issue. — mlofton, Aug 13 '20 at 14:29
@mlofton, yes, matters are clearer now. For small samples, error distribution matters; Dickey and Fuller used a normal distribution to obtain the critical values, so this is a (unavoidable) limitation of their tables. For large samples / asymptotically, error distribution does not matter. (I find it helpful to think that this parallels the properties of an OLS estimator. It also has a known asymptotic distribution (normal), but an unknown small sample distribution.) Thank you for your help! — Richard Hardy, Aug 13 '20 at 14:40
Hi Richard: I'm glad that I helped. To be honest, I'm not sure about the large sample issue. By this I mean, could one still use the DF tables ( the larger n columns of it ) if one didn't assume normality for the error term ? That's a question for the experts. But I keep in mind that what christopher was referring to has close to zero relation to the Dickey Fuller tables. Phillips took the unit root investigation to another level which is often the case with him !!!! — mlofton, Aug 13 '20 at 16:47
@mlofton, are you sure? I understood that Phillips and DF are closely related; Phillips simply provides the theory for what DF are simulating. Here is what Christoph Hanck said: *To perform that simulation, you must draw errors from some distribution, and the conventional choice is to simulate normal errors. **But, as Phillips shows, if you were to draw the errors from some other distribution satisfying the above requirements, you would asymptotically get the same distribution.*** So one can use the DF tables with large $n$ for any distribution. — Richard Hardy, Aug 13 '20 at 18:37
@mlofton, perhaps this is why textbooks typically do not mention the normality assumption in the context of the DF test (unless I remember wrong). — Richard Hardy, Aug 13 '20 at 18:39
Hi Richard: I agree that you get the ASYMPTOTIC distribution that Phillips derives if you use non-normal error tems. But the asymptotic distribution of the DF test statistic that Phillips derives ( which is not used by practitioners ) only corresponds to the n = $\infty$ column of the DF table and I doubt that column even exists in the table. The DF-test statistics/critical values in the columns of the DF tables are based on the normal error term assumption ( see greene link I pointed to in comment ) and, except for $n = \infty$, have little to do with the distribution derived by Phillips. — mlofton, Aug 14 '20 at 16:41
RIchard: I guess the best way to say it is: Yes, Phillips simply provides the ASYMPTOTIC theory for what DF are simulating. But, what DF is simulating is not asymptotic. DF are obtaining the small sample distribution through simulation and this is a very different beast. Therefore, the DF tables are essentially unrelated to Phillips asymptotic distribution. — mlofton, Aug 14 '20 at 16:46
@mlofton, saying that the DF tables are essentially unrelated to Phillips asymptotic distribution is a very strong (and hopefully wrong) statement. Are you sure $n=100$ (or $n=1000$) and $n=\infty$ produce noticeably different distributions? Asymptotic distributions would be useless in statistics if they did not approximate finite sample distributions for sufficiently large $n$. I hope Phillips did not labour for nothing. — Richard Hardy, Aug 14 '20 at 17:14
Hi RIchard: The best thing to do would be to simulate under the error term of interest and see if the simulation gives the same critical values as the DF table. I would tend to doubt it will but I could be wrong. I'm not claiming infallibility here :). All I'm saying is that, to use Phillips result to make the claim is not the correct thing to do. — mlofton, Aug 15 '20 at 17:11
Just one more thing: Keep in mind that what Phillips derives is so complicated that it would be quite difficult to compare his approach to DF directly. . But my goal was not to denigrate Phillips contributions. He's a god in econometrics but that doesn't mean that the majority of the research that he develops is necessarily useful for practitioners. Often, very smart people are interested in the theory for theory's sake which is fine and can also be quite useful. I couldn't even understand the most of the things that Phillips does so I definitely did not intend to minimize his contributions. — mlofton, Aug 15 '20 at 17:17
@mlofton, OK, thank you very much for an enlightening discussion! — Richard Hardy, Aug 15 '20 at 17:50
Hi Richard: I'm glad that you found it enlightening. I looked ( just glanced very briefly ) at the phillips paper and it is more "applied" than I expected. ( still TOUGH AND THEORETICAL but more of an applied flavor than a lot of his other ones ). It uses a spectral density approach and does compare some results to DF. The conclusion in the last part seems to say that, in finite sample, DF is preferred. But that's not my point. As I said, my point was that using the assumptions made in Phillips' paper to infer that the DF tables don't assume normality, is not the way to go. All the best. — mlofton, Aug 16 '20 at 19:05

Richard Hardy · Accepted Answer · 2020-08-07T08:39:01.577

Cointegration and regression are quite different categories.

Cointegration is a phenomenon observed in a time series context. Several time series cointegrate if there exists a linear combination that is integrated of a lower order than the series themselves. (See also the tag description for cointegration.)

Regression has several meanings. The most relevant is perhaps the one in the tag description of regression which says it is techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

The relationship between cointegration and regression is that one can use regression to analyze the relationship between several cointegrated variables.

(Unlike the simple case of cross-sectional data, standard regression estimators such as OLS of a naive regression of several cointegrating variables have some unusual properties, e.g. superconsistency. A helpful regression model for cointegrating time series is cointegrated (restricted) VAR and its alternative representation VECM that clearly exposes the short- and long-run relationships between the variables.)

score 0 · Answer 2 · answered Aug 23 '20 at 06:06

On the level of data generating processes, cointegration is a special case of linear regression. (In this sense, I disagree somewhat with @RichardHardy.)

Say the time series $(x_t, y_t)$, $t = 1, 2, \cdots$, follow a linear regression if $$ y_t = \beta x_t + \epsilon_t, \mbox{ where } E[\epsilon_t] = 0. $$

If we agree on this terminology, that clearly a cointegrating relationship is a special case of linear regression. You might call it a "cointegration regression".

The difference is distributional assumptions on data generating process $(x_t, y_t)$, $t=1,2,\cdots$. In a usual regression model. $(x_t, y_t)$ is stationary. For cointegration, $x_t$ and $y_t$ are both non-stationary but the linear combination $y_t - \beta x_t$ is. These two settings are very different, from both statistical and empirical perspectives. (In this sense, I don't disagree with @RichardHardy.)

For example, statistically, under stationarity, OLS $\hat{\beta}$ is consistent only if $E[x_t \epsilon_t] = 0$ (or at least $\frac{1}{n} \sum_{t=1}^n E[x_t \epsilon_t] \rightarrow 0$). Under cointegration, $\hat{\beta}$ is always super-consistent.

Empirically, cointegration is about modelling long-run equilibrium relationships whereas under stationarity the regression describes a contemporaneous relationship.

score 0 · Answer 3 · answered Aug 02 '21 at 05:37

Regression and cointegration, in a nutshell, are different things. A cointegration relation comes out from a VECM approach. In this way is just a long term relation inside a system with short term elements(VECM SYSTEM). Naturally if you have just two variables cointegrated, regression equation will be also the contegration relation. They are equal. But in more dimension, three or more, the equations will be different and you should use appropriate methods.

What's the relationship between cointegration and linear regression?

3 Answers3

Linked