Dickey-Fuller indicates that stationary Ornstein-Uhlenbeck is non-stationary?

Question

Suppose that I simulate an Ornstein-Uhlenbeck processes $X=(X_t)$ using this example code (Euler-Maruyama).

Using this code, it should be possible to ensure that the time series $X$ is

stationary, by sampling its initial value $X_0$ from the stationary distribution of $X$. Specifically: if $X$ has the coefficients $(\theta, \mu, \sigma)$ then we set $X_0\sim\mathcal{N}(\mu, \sigma^2/(2\theta))$; in the given code, this corresponds to setting y_init = np.random.normal(loc=mu, scale=sigma**2/(2theta)).

However, when testing the resulting simulation of $X$ for stationarity using the Dickey-Fuller test the null hypothesis (of non-stationarity) is typically accepted with high $p$-value:

from statsmodels.tsa.stattools import adfuller
adfuller(ys)

returns statistics such as

(-1.366413328210143,   # <-- Dickey-Fuller statistic of X
 0.5982679237847218,   # <-- p-value
 0,
 50000,
 {'1%': -3.430480792715035,
  '5%': -2.8615978076939204,
  '10%': -2.5668007691236},
  -380470.4518286485)

Is this to be expected, or does it indicate that the simulated $X$ is non-stationary?

score 3 · Accepted Answer · answered Oct 14 '20 at 15:17

Not due to mean reverting behaviour

I agree with sp59b2 that a longer time and larger number will improve the results.

However, I do not believe that it is due to the mean reverting behaviour at the beginning of the time series when the starting point is far away. This is because you initialize the first point such that it is already distributed as the limiting long run distribution.

(although I would argue that the variance is $\sigma^2/\theta$ instead of $\sigma^2/(2\theta)$, based on the equation $\text{Var}(x) = \text{Var}(x) (1-\theta \Delta t) + \sigma^2 \Delta t$, but anyway it doesn't seem to be the reason for the high p-values, if you correctly initialize and have not the initial phase of reverting to the mean, then you still get the high p-values)

Instead, the reason why longer runs will be more likely to make you reject the null hypothesis is because of more power.

Power

the null hypothesis (of non-stationarity) is typically accepted with high p-value

When you get a high p-value then this does not mean that you accept the null hypothesis.

Instead, it just means that you do not (can not) reject the null hypothesis.

It means that either the null hypothesis is true, or your effect (the difference from the unit root) is not large enough in order to be differentiated from a unit root.

Either with a larger effect (smaller root) or with a larger amount of data will you be able to better (more likely) detect the stationarity.

When the null hypothesis is true, when you have a unit root, then the p-value should be uniform distributed, and the probability to reject the test should equal that $\alpha$ level.

However, when the null hypothesis is not true, then you can still be arbitrarily close to the null hypothesis (a root that is very close to unit root) which means that you won't reject the test with much more probability that the $\alpha$ level. See for instance the graph below which shows estimates of the power as function of $\theta$. If $\theta$ is close to 0 then the rejection probability will be close to the rejection probability of the null hypothesis (in this case 0.05). When $\theta$ increases (when the alternative is more different from the null) then you get also more probability that the test will reject the null hypothesis.

### function timeseries() from https://stats.stackexchange.com/a/491928

### compute for 100 different values of theta
mup <- rep(0,100)

for (i in 1:100) {
  ### count the rejections out of hundred tests
  mup[i] <- sum( replicate(10^2, tseries::adf.test( timeseries(time = 1, n = 1000, mu = 0, theta = i, sigma = 2) )$p.value < 0.05))
}

plot(1:100, mup/100, 
     xlab = expression(theta), ylab = "fraction of rejections p<0.05")

P-value calculation

In addition, with the Dickey-Fuller test the p-values are computed in a particular way that makes them not exact. What you are effectively doing is some sort of linear model.

$$(x_t-x_{t-1}) = \alpha x_t + \beta_0 + \beta_1 t + \epsilon_i$$

and the assumption is $\alpha = 0$. The Dickey-Fuller test uses the same t-value of the coeffient as normally computed for linear regression. However the p-value is not based on the t-distribution (this is because those residual terms $\epsilon$ are not independent, if some $x_t$ is higher/lower due to randomness, then this will influence both residuals $\epsilon_i$ and $\epsilon_{i+1}$ in the regression).

The p-values are computed from interpolation of tables, and this might potentially influence the p-value (Although I believe that this is not really having a strong effect. But, just to be complete).

Your alternatives are AR(1) series with AR(1) parameter "$\theta$" + 1 (+1 after converting ADF regression parameter $\theta$ to AR(1) parameter) between 1 and 101? Or are you dividing $\theta$ by sample size 100? — Michael, Oct 15 '20 at 07:12
@Michael For AR(1) (which can be generalized with varying types of trends $f(t)$), $$X_t = f(t) + \varphi X_{t-1}+\epsilon_t$$ The null hypothesis is that the parameter $\varphi=1$ equals 1. The alternative hypothesis is $\varphi<1$. In the converted form this is$$X_t-X_{t-1} = f(t) + \theta X_t +\epsilon_t$$ and then the null is $\theta=0$ and the alternative $\theta<0$ — Sextus Empiricus, Oct 15 '20 at 07:17
Ok, so what's the $\theta$ in the plot that ranges from 0 to 100 and is supposed to parametrize the alternative when $\theta > 0$? — Michael, Oct 15 '20 at 07:20
@Michael I will check if I have used my parameters correctly. They can be confusing and there might be double use. For the Ornstein-Uhlenbeck process the parameter relates to a *differential equation* $$d X_t = -\theta X_t dt + \sigma dW_t$$ That is the $\theta$ that I am referring to. In the discretized form this becomes an AR(1) process $$ \Delta X_t = -(\theta \cdot \Delta t) X_t + (\sigma \cdot \Delta t ) \epsilon_t$$ but the parameter $\theta$ is not the same and get's multiplied with $\Delta t$. You need to take a sufficiently small time step $\Delta t$. — Sextus Empiricus, Oct 15 '20 at 07:29
I see. The mean-reversion parameter $\theta$ in the OU equation, after discretization to AR(1) sampled at $\frac{1}{n}$ intervals, corresponds to AR(1) parameter $1-\frac{\theta}{n}$. — Michael, Oct 15 '20 at 07:32
@Michael Indeed, so the absolute value of $\theta$ in the OU process can be as large as we want without getting an explosive non-stationarity. The corresponding discretized AR(1) model can have explosive non-stationarity, but that is because of a poor choice of $n$ or $\Delta t$. — Sextus Empiricus, Oct 15 '20 at 07:46

Dickey-Fuller indicates that stationary Ornstein-Uhlenbeck is non-stationary?

1 Answers1

Not due to mean reverting behaviour

Power

P-value calculation