$R^2$ and adjusted $R^2$ in presence of overlapping observations

Question

Given a linear model $$ y=X\beta+\varepsilon, $$ the population value of $R^2$ is $$ R^2=1-\frac{\text{Var}(\varepsilon)}{\text{Var}(y)}. $$ The vanilla estimator of $R^2$ is $$ \hat R^2=1-\frac{\widehat{\text{Var}}_{biased}(\varepsilon)}{\widehat{\text{Var}}_{biased}(y)}=1-\frac{\frac{1}{n}\sum_{i=1}^n \hat\varepsilon_i^2}{\frac{1}{n}\sum_{i=1}^n (y_i-\bar{y})^2} $$ and the adjusted estimator of $R^2$ is $$ \hat R^2_{adj.}=1-\frac{\widehat{\text{Var}}_{unbiased}(\varepsilon)}{\widehat{\text{Var}}_{unbiased}(y)}=1-\frac{\frac{1}{n-p-1}\sum_{i=1}^n \hat\varepsilon_i^2}{\frac{1}{n-1}\sum_{i=1}^n (y_i-\bar{y})^2}. $$ This is applicable to the case where the observations of variables do not overlap. Meanwhile, I am interested in the case when they do. Under overlapping observations where the overlap is of length $k$*, the long-run variance of a generic variable $x$ (where we may put $y$ or $\varepsilon$ in its place as needed) is $$ \text{LRVar}(x)=\sum_{j=-k}^k \text{Cov}(x_t,x_{t-j})=\text{Var}(x)+2\sum_{j=1}^k \text{Cov}(x_t,x_{t-j}) $$ and some estimators for it (like Newey-West) are available.**

Questions

Should an estimator of the long-run variance be used in estimating $R^2$, or should one stick to the regular estimators as in $\hat R^2$ and $\hat R^2_{adj.}$ above?
Would the choice of regular variance vs. long-run variance have any effect, given that $\text{Var}(\varepsilon)$ (or $\text{LRVar}(\varepsilon)$) is in the numerator and $\text{Var}(y)$ (or $\text{LRVar}(y)$) is in the denominator, hinting at possible cancellations?
How would the interpretation of these estimators of $R^2$ (one employing the regular variance estimator and another employing the long-run variance estimator) differ?

*By overlapping observations of $x_t$ where the overlap is of length $k$ I mean a case where $x_t=\sum_{\tau=t-k+1}^t \xi_\tau$ where $\xi_\tau$ is some random process. Hence, $x_t$ and $x_{t-\kappa}$ measure partly the same thing for $\kappa<k$; they "overlap". An example would be measuring monthly financial returns every day. The monthly return $x_t$ of today overlaps with the monthly return of yesterday $x_{t-1}$ to a large degree: given a month with 30 trading days, 29 daily returns $\xi_{t-29},\dots,\xi_{t-1}$ constitute both $x_t$ and $x_{t-1}$, while only $\xi_{t}$ and $\xi_{t-30}$ make $x_t$ and $x_{t-1}$ differ. (How many trading days a month has depends on the market.)

**I guess estimating $\text{LRVar}$ by just plugging in sample counterparts of population quantities may not be a good idea in cases where $k<<n$.

Could you provide an explicit definition or description of what you mean by "overlapping observations"? — whuber, Oct 19 '19 at 13:45
Thank you. I find your question confusing because the covariances of the $x_i$ don't enter into any of the formulas you give for $R^2,$ so how could they possibly be relevant? — whuber, Oct 19 '19 at 14:11
@whuber, $x$ is intended to denote a generic variable, so one could have $y$, $\varepsilon$ and any other relevant variable in place of $x$. But I understand my formulation may be confusing. I have edited again to clarify. — Richard Hardy, Oct 19 '19 at 14:42
Thank you -- but now your conditions seem contradictory. When the $\varepsilon$ are correlated, what justifies these formulas for $R^2$? What do they even mean in that case? What exactly are your variance estimators? Why aren't you applying the appropriate generalized least squares estimators? — whuber, Oct 19 '19 at 15:32
@whuber, yours are the questions I am trying to get answers to. I am not asking about the case of nonoverlapping observations. I am asking how I should adapt the formulas and interpretation of $R^2$ and its estimators in the case of overlapping observations. I have not specified variance estimators exactly (though I mention Newey-West as a candidate) as I do not know exactly what they should be for $\hat R^2$ vs. $\hat R^2_{adj.}$. It would be great to get an answer that explains these things. I bet you know enough to compose one. — Richard Hardy, Oct 19 '19 at 16:03
@whuber, also, I would use OLS rather than GLS as the efficient GMM estimator for $\beta$ would be the OLS estimator and the efficient variance estimator would be the plug-in version of $\text{LRVar}$, according to Hayashi "Econometrics" Sections 6.6 and 6.8. (Though some other texts and my own simulations suggest the plug-in version is problematic in smallish samples as it can deliver negative values quite often.) — Richard Hardy, Oct 19 '19 at 16:13
Related: ["Effective sample size of a time series of overlapping observations"](https://stats.stackexchange.com/questions/439961). — Richard Hardy, Dec 09 '19 at 09:44

crux26 · Answer 1 · 2019-10-19T15:48:45.737

0

I will refer to population, vanilla, adjusted as (1), (2), (3), respectively.

Q1) As (1) is for population while (2), (3) are its sample analogue, the same will hold for LRVar. For the population you will use $k=\infty$ and some integer for the sample.

Q2) I haven't done the calculation, but using LRVar will make a difference. Given that Newey-West is to take autocorrelation in errors into account, it will have less SE compared to OLS or HC estimators. It returns "more conservative" values, so I presume using LRVar will result in smaller $R^2$.

Q3) Technically they will differ, but in terms of interpretation I wouldn't bother. SEs or p-values does matter, but $R^2$ are just to denote the overall fit and often the value themselves are not that meaningful. Adjusted $R^2$ < unadjusted $R^2$ will hold for both regular variance and long-run variance, so I will just use the regular one, which is easier.

edited Oct 19 '19 at 15:48

answered Oct 19 '19 at 14:37

crux26

101
5

Thank you for your answer. Regarding Q1, $k$ is known, so it would be incorrect to use $k=\infty$ in population. In sample, $k$ might not be relevant because of my second note **. Regarding Q2, both the numerator and the denominator will be affected in a similar way, so why should $R^2$ end up smaller? Regarding Q3, $R^2$ is a meaningful measure of fit, and I suspect it can have more than one interpretation depending on the nature of the data (nonoverlapping vs. overlapping being one example). I am expecting a more concrete answer to Q3. – Richard Hardy Oct 19 '19 at 14:51
Q1) Errors can be autocorrelated with infinite length, so you should consider $k=\infty$ for the population or "large sample". For Q2) after the second thought, they may be the same for OLS case. NW adjust only for $\beta$'s significance, not their estimates. But as you pointed out, it could be different for more general cases. For Q3), if you're making a judgement on a model's validity with some threshold $R^{2}$ value, say, 90%, then you should always go for HACs. If errors are not correlated, regular=HAC. If correlated, regular is wrong. – crux26 Oct 19 '19 at 16:12
Regarding Q1, once again, k is known and fixed by design in case of overlapping observations. Your suggested solution is addressing a different problem than the one specified in the OP. Re Q2, *it will have less SE compared to OLS* is incorrect; the SE will be larger, not smaller. Re Q3, I do not see how this answers the actual question. – Richard Hardy Oct 19 '19 at 18:17

$R^2$ and adjusted $R^2$ in presence of overlapping observations

1 Answers1

Linked