The regression model for the AR model that you specify departs from the assumptions of the classical regression model in that the explanatory variables are not deterministic or fixed. The explanatory variables are stochastic because they are lags of the dependent variable and hence depend on the stochastic term $\epsilon_i$. I think this is the point you are referring to with random design.
To avoid confusion with the dependent variable and for simplicity, I will denote as $Z$ the whole set of explanatory variables $X_{i-1}$, $X_{i-2}$,..., $X_{i-q}$.
In the context of stochastic regressors, we may have three situations, the explanatory variables and the disturbance term $\epsilon_i$ are:
- independent of each other, $E(Z_i \epsilon_{i-k}) = 0$, $\forall i,k$;
- contemporaneously uncorrelated, $E(Z_i \epsilon_i) = 0$, $\forall i$;
- contemporaneously correlated, $E(Z_i \epsilon_i) \neq 0$
for some $i$.
In the context of the AR model we are in the second case (lagged versions of $X$ depend on lagged values of $\epsilon$, but not on the current value at time $i$ or at future times).
Your suggestion of conditioning on $Z$ is not appropriate. To see this, the joint distribution of the dependent variable $X$ is given by the product of the conditional and the marginal distributions:
$$
f(X,Z;\vec{\beta},\sigma^2) = f(X|Z;\vec{\beta},\sigma^2) f(Z;\cdot) \,.
$$
If the marginal distribution did not depend on the parameters of the model, then we could get rid of it and work with the conditional distribution. However, since the explanatory variables follow the same AR model, the marginal distribution depends on the parameters of the model and we actually have $f(Z;\vec{\beta},\sigma^2)$. Hence, we cannot ignore it and things are not as easier as they would be in the other case.
Although the explanatory variables are not independent of the innovations
$\epsilon$, they are contemporaneously uncorrelated, i.e., at a given time $i$, $Z_{i-j}$ with $j>0$ depends on past values of $\epsilon_i$ but not on the current value $\epsilon_i$. In this case, the distribution of the test statistics is unknown in finite (small) samples. However, the classical results still apply asymptotically and the $F$-test statistic asymptotically follows the $\chi^2$ distribution with $q-p$ degrees of freedom.
Note: be careful, the degrees of freedom would be $(q-p)$, not $(q−p\,, n−q)$ that you suggest, that would be the degrees of freedom of the Snedecor's $F$-distribution to be used in finite samples when the classical assumptions hold.
Details (in addition to the reference given by @CristiánAntuña): The above is true under the classical assumptions and the requirement that $\hbox{plim } Z'Z/n = Q$ is a finite and positive definite matrix. Then, the Mann and Wald theorem, Slutsky's theorem and Cramér's theorem can be applied to show the above result.
Edit 2
Note: When using the asymptotic test statistic, be aware that it needs to be normalized before comparing it with the $\chi^2$ distribution. In particular, the standard $F$-test statistic must be multiplied by the degrees of freedom in the numerator of the $F$ distribution. To see this, note that if a random variable $Y$ has an $F$ distribution and $X_1$, $X_2$ denote $\chi^2$ distributions with $n_1$ and $n_2$ degrees of freedom respectively, then $Y$ can be written as $Y=\frac{X_1/n_1}{X_2/n_2}$, i.e., the ratio of the two $\chi^2$ divided each by their degrees of freedom. The degrees of freedom in the denominator of the $F$-test goes to infinity, making $X_2/n_2$ go to one, hence, we are left with $Y=X_1/n_1$ and hence it is $n_1\times Y$ (not $Y$) what follows the $\chi^2$ distribution.
One issue still remains to be taken into account. As the explanatory variables
are lagged versions of the same variable, they are likely to be correlated
(more than likely if $X_t$ actually follows an AR structure). This will cause larger standard errors of parameter estimates, which may in turn affect the test statistic.
Roughly, the consequences of collinearity are the same as those caused by a small sample size. There isn't enough information in the data to discern to which variable the influence of their common component should be attributed. As we have already mentioned that in this context we need a large sample size, bootstrapping the test statistic would be a useful strategy that may get to kill two birds with one stone.
Someone may think, why would you want to do this test? Why care about the $F$-test in an AR model? I don't think you want to test for example an economic theory, as it's done with econometric models. If the purpose is to choose the optimal lag order for the AR model, then the Akaike's or Bayesian information criteria might be a better alternative. Anyway, I find this an interesting question and a nice example to discuss the classical regression model and variations that relax some of the assumptions.
Edit (revised)
For illustration, I show a small simulation exercise to assess the issues discussed above. According to this small experiment, the asymptotic test statistic performs well for a sample size of 100 observations. The empirical level found in this exercise was 0.047, close to the chosen nominal level, 5% For a smaller sample size of 60 observations similar results were obtained. The R code that replicates the results of the exercise is shown below.
Generate data (1000 series of length 100) from an AR(2) model. The same data set is used in both cases below.
set.seed(123)
xdata <- matrix(nrow = 100, ncol = 1000)
n <- nrow(xdata) # sample size
niter <- ncol(xdata) # number of iterations
for (i in seq_len(niter))
xdata[,i] <- arima.sim(n = n, model = list(ar = c(0.6, -0.2)))
Compute the empirical level of the asymptotic $\chi^2$-test statistic for the null that the AR coefficients or order 3 and 4 are zero.
chisq.stats <- pvals <- rep(NA, niter)
resids <- matrix(nrow = n-4, ncol = niter)
coefs <- matrix(nrow = niter, ncol = 2)
for (i in seq_len(niter))
{
x <- xdata[,i]
# restricted model
fit1 <- lm(x[5:n] ~ x[4:(n-1)] + x[3:(n-2)])
# save estimated coefficients and residuals (extended with zeros)
# to save computatons next in the bootstrap case
coefs[i,] <- coef(fit1)[-1]
resids[,i] <- residuals(fit1)
# unrestricited model
fit2 <- lm(x[5:n] ~ x[4:(n-1)] + x[3:(n-2)] + x[2:(n-3)] + x[1:(n-4)])
# Chisq-test statistic and p-value
chisq.stats[i] <- 2 * anova(fit1, fit2)$F[2]
pvals[i] <- pchisq(chisq.stats[i], df = 2, lower.tail = FALSE)
}
sum(pvals < 0.05) / niter
#[1] 0.047