Given a linear model $$ y=X\beta+\varepsilon, $$ the population value of $R^2$ is $$ R^2=1-\frac{\text{Var}(\varepsilon)}{\text{Var}(y)}. $$ The vanilla estimator of $R^2$ is $$ \hat R^2=1-\frac{\widehat{\text{Var}}_{biased}(\varepsilon)}{\widehat{\text{Var}}_{biased}(y)}=1-\frac{\frac{1}{n}\sum_{i=1}^n \hat\varepsilon_i^2}{\frac{1}{n}\sum_{i=1}^n (y_i-\bar{y})^2} $$ and the adjusted estimator of $R^2$ is $$ \hat R^2_{adj.}=1-\frac{\widehat{\text{Var}}_{unbiased}(\varepsilon)}{\widehat{\text{Var}}_{unbiased}(y)}=1-\frac{\frac{1}{n-p-1}\sum_{i=1}^n \hat\varepsilon_i^2}{\frac{1}{n-1}\sum_{i=1}^n (y_i-\bar{y})^2}. $$ This is applicable to the case where the observations of variables do not overlap. Meanwhile, I am interested in the case when they do. Under overlapping observations where the overlap is of length $k$*, the long-run variance of a generic variable $x$ (where we may put $y$ or $\varepsilon$ in its place as needed) is $$ \text{LRVar}(x)=\sum_{j=-k}^k \text{Cov}(x_t,x_{t-j})=\text{Var}(x)+2\sum_{j=1}^k \text{Cov}(x_t,x_{t-j}) $$ and some estimators for it (like Newey-West) are available.**
Questions
- Should an estimator of the long-run variance be used in estimating $R^2$, or should one stick to the regular estimators as in $\hat R^2$ and $\hat R^2_{adj.}$ above?
- Would the choice of regular variance vs. long-run variance have any effect, given that $\text{Var}(\varepsilon)$ (or $\text{LRVar}(\varepsilon)$) is in the numerator and $\text{Var}(y)$ (or $\text{LRVar}(y)$) is in the denominator, hinting at possible cancellations?
- How would the interpretation of these estimators of $R^2$ (one employing the regular variance estimator and another employing the long-run variance estimator) differ?
*By overlapping observations of $x_t$ where the overlap is of length $k$ I mean a case where $x_t=\sum_{\tau=t-k+1}^t \xi_\tau$ where $\xi_\tau$ is some random process. Hence, $x_t$ and $x_{t-\kappa}$ measure partly the same thing for $\kappa<k$; they "overlap". An example would be measuring monthly financial returns every day. The monthly return $x_t$ of today overlaps with the monthly return of yesterday $x_{t-1}$ to a large degree: given a month with 30 trading days, 29 daily returns $\xi_{t-29},\dots,\xi_{t-1}$ constitute both $x_t$ and $x_{t-1}$, while only $\xi_{t}$ and $\xi_{t-30}$ make $x_t$ and $x_{t-1}$ differ. (How many trading days a month has depends on the market.)
**I guess estimating $\text{LRVar}$ by just plugging in sample counterparts of population quantities may not be a good idea in cases where $k<<n$.