Magnification of sampling error in long term return regressions

Question

My question is motivated by The Myth of Long-Horizon Predictability

Suppose I have $X$ and $Y$, where $X$ is stationary but highly autocorrelated and $Y$ represents a stock price. Say I do the following time series regression:

$Y(t+5yr)/Y(t)=X(t)b+e(t)$ with all standard assumptions being true.

Here is my understanding of the setting of a long term return regression -

I have a highly autocorrelated sample of the response variable, as well as of $X$. As far as I can tell this alone still gives me an unbiased, consistent estimate of $b$ and causes no problem (as long as there is an autocovariance function characterizing the joint behaviour of returns and $X$).

Now like this paper's (http://pages.stern.nyu.edu/~rwhitela/papers/mlhp%20rfs08.pdf) author states, the following 2 are the reasons for an extremely high R-squared in a long term setting:

any unusual draw from the returns at time t will influence the returns for k periods, where k is the time horizon.

This is just saying that the returns are highly autocorrelated.

a persistent regressor will have very similar values for t, t−1, t−2, .., t−k

Apparently combining these results in a "very high sampling error across time", as the authors state. I feel the above argument is hand wavy. Why would one expect the sample of $(return,dividends)$ to be non representative of their true joint distribution? Can someone provide a more formal treatment?

Additionally, I have done a simulation in R (code appended):

R script below:

rsquaredmean=function(autocorrelation, horizon) {

j=rep(NA, 30) 
    ## will store the rsquared across each of the 30 simulations.
for(i in 1:30){
  y=rep(NA,400) ##daily return like variable, completely random.
  y[1]=1
  for(k in 1:400)
    y[k+1]=rnorm(1,0,1)
}

s=rep(NA,100)   ##variable meant to be proxy for large horizon returns
for(l in 1:100){
  s[l]=sum(y[l:l+horizon])
}

z=rep(NA, 100)  ## highly autocorrelated predictor, 
                ## meant to be a dividends like variable
z[1]=1
for(m in 1:99){
  z[m+1]=autocorrelation*z[m] + rnorm(1, 0, 1)
  x=lm(s~z)
  j[i]=summary(x)$r.squared
}
return(mean(j))

}

I don't see the mean R squared increasing as I increase the horizon. What am I missing in the simulation?

Not intentional, I just fixed it. However I'm getting similar R squared means across horizons. — Arshdeep, Feb 18 '22 at 16:08
Sorry for the terrible code. It works now and the result is still unchanged, although the squared's are more stable. — Arshdeep, Feb 18 '22 at 16:14

Magnification of sampling error in long term return regressions

0 Answers0