I would like to know if my simulation approach to find the coverage for a confidence interval of a prediction $\boldsymbol{\beta}^T\boldsymbol{X}_N$ is correct
- I generated a dataset of $n$ samples of covariates $\boldsymbol{X} \in \mathbb{R}^p$ and $Y \in \mathbb{R}$ that follow the linear model $Y_i = \boldsymbol{\beta}^T\boldsymbol{X}_i + \varepsilon_i$ for $i=1,\dots,n$. So I have a design matrix $\mathbb{X} \in \mathbb{R}^{n \times p}$ and a response vector $\mathbb{Y} \in \mathbb{R}^{n}$. Here I set $n = 512$ and $p = 1024$. (the data were generated as a multivariate standar normal)
- I created a new independent observation $\boldsymbol{X}_N \in \mathbb{R}^p$, $\boldsymbol{X}_N \sim N(0,I_p)$
- Compute $\widehat{\boldsymbol{\beta}} \in \mathbb{R}^p$ for the linear model
- Find the true value of $\boldsymbol{\beta}^T\boldsymbol{X}_N$ (since I can compute the true parameter $\boldsymbol{\beta}$)
- Compute an estimator of the variance $\hat{V} = \text{var}(\widehat{\boldsymbol{\beta}}^T\boldsymbol{X}_N)$
- Compute the confidence interval for $\boldsymbol{\beta}^T\boldsymbol{X}_N$ as $(\widehat{\boldsymbol{\beta}}^T\boldsymbol{X}_N \pm z_{\alpha/2}\hat{V}^{1/2})$ assuming asymptotic normality.
Now I'm not sure how to proceed. Should I repeat the process from (3) or generate another dataset? Any help would be much appreciated.
Edit: I'm interested in the behavior of $\widehat{\boldsymbol{\beta}}^T\boldsymbol{X}_N$ since this is a univariate term. The new observation $\boldsymbol{X}_N$ is fixed once is generated. So yes, I should say prediction inverval, but the rest of the question remains.