The question is a bit similar to question 147242 . I'm dealing with a multiple linear regression model, say: $$ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 $$ and I'm looking for an algebraic equation to calculate (numerically) the prediction interval (PI) for a new prediction $y_0$.
In contrast to the previously discussed examples, each of the model coefficients ($\beta_0, \beta_1$ and $\beta_2$) in this case have an error-bar (extracted via bootstrapping from a distribution, with the distributions being numerical in nature not analytic, and the distributions are specific for each of the three coefficients).
So far most examples deal with either a trivial single linear regression or a multiple linear regression, but in each case only the impact of $x_1$...$x_n$ terms is considered (e.g., question 147242). This in itself is already quite useful, however, in my specific case the $\beta_0$,$\beta_1$ and $\beta_2$ are the mean values of distributions.[*cf PPS] Is there a way to incorporate the uncertainty of the $\beta_i$'s (c.q. the "error-bars") in the calculation of the prediction interval (and Confidence Interval).
To put it very simple, how can the equation $$ \hat{V}_f=s^2\cdot\mathbf{x_0}\cdot\mathbf{(X^TX)^{-1}}\cdot\mathbf{x_0^T} + s^2 $$ be modified to also incorporate the fact that the coefficients themselves are a mean of a distribution.
(PS: One could create an ensemble of various model instances with the $\beta_i$ drawn from their respective distributions, and based on the distribution of obtained $y_0$ calculate the CI of the $y_0$, but this is not really computationally efficient and brings a lot of other issues which I would like to avoid.)
(PPS: The regression model presented is not the result of a direct regression toward a single data set, instead it is constructed as follows:
- Create an ensemble of N data sets.
- On each data set a regression gives rise to a linear model as indicated in the post above. This gives rise to N values for each of the coefficients $\beta$.
- The mean of each of the three sets is calculated.
- These three mean coefficients are the coefficients of the model presented above.
- The goal here: find the prediction interval for the averaged model above taking into account the fact that the coefficients $\beta$ are calculated from numerical distributions.)