Confidence Interval of the mean response for Weighted Least Squares (WLS)

Question

I know that the confidence interval of the mean response for OLS is:

$$ \hat{y}(x_0) - t_{\alpha /2, df(error)}\sqrt{\hat{\sigma}^2\cdot x_0'(X'X)^{-1}x_0} $$ $$ \leq \mu_{y|x_0} \leq \hat{y}(x_0) + t_{\alpha /2, df(error)}\sqrt{\hat{\sigma}^2\cdot x_0'(X'X)^{-1}x_0} . $$ (the formula is taken from Myers, Montgomery, Anderson-Cook, "Response Surface Methodology" fourth edition, page 407 and 34)

What would be the confidence interval for WLS ?

Many thanks !

jld · Accepted Answer · 2018-10-05T14:22:18.583

$\newcommand{\e}{\varepsilon}$TL;DR: If you have $$ y = X\beta + \e $$ with $\e \sim \mathcal N(0, \sigma^2 \Omega)$ for $\Omega$ known, then it will be $$ x_0^T\hat\beta \pm t_{\alpha/2, n-p} \sqrt{\hat\sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1}x_0} $$ with $\hat\beta = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y$ and $$ \hat\sigma^2 = \frac{y^T\Omega^{-1}(I-H)y}{n-p}. $$

This doesn't require $\Omega$ to be diagonal but it is essential that it is known.

Here's how to derive this.

The MLE of $\beta$ is $$ \hat\beta = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y $$ (this is just generalized least squares).

From this it follows that $\hat\beta$ is still Gaussian and $$ \text E(\hat\beta) = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}\text E(y) = \beta $$ and $$ \text{Var}(\beta) = \sigma^2 (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}\Omega \Omega^{-1}X(X^T\Omega^{-1}X)^{-1} \\ = \sigma^2 (X^T\Omega^{-1}X)^{-1} $$ so all together $$ \hat\beta \sim \mathcal N(\beta, \sigma^2 (X^T\Omega^{-1}X)^{-1}). $$

In order to get the confidence interval for the (conditional) mean $\mu_0$ at a particular point $x_0$, we need to determine the distribution of $$ \frac{\hat y_0 - \mu_0}{\hat {\text{SD}}(\hat y_0)} $$ so we can get the appropriate quantiles.

We have $$ \hat y_0 = x_0^T\hat\beta \sim \mathcal N(x_0^T\beta, \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0) $$

so this is $$ \frac{x_0^T\hat\beta - x_0^T\beta}{\sqrt{\hat \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} = \frac{x_0^T(\hat\beta - \beta) / \sqrt{x_0^T(X^T\Omega^{-1}X)^{-1} x_0}}{\sqrt{\hat \sigma^2}}. $$ The numerator of that is still Gaussian, and $$ \frac{x_0^T(\hat\beta - \beta)}{\sqrt{x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} \sim \mathcal N\left(0, \sigma^2\right). $$ Recall that a t distribution with $d$ degrees of freedom is defined to be $$ \frac{\mathcal N(0,1)}{\sqrt{\chi^2_d / d}} $$ where the two distributions are independent.

We've pretty much got the numerator now so the remaining questions are (1) what's the distribution of $\hat\sigma^2$, and (2) if they are independent or not.

We will want to set $$ \hat \sigma^2 = \frac{y^T\Omega^{-1}(I-H)y}{n-p} $$ where $$ H = X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1} $$ is the GLS hat matrix.

That $\Omega^{-1}$ looks a little weird in $\hat\sigma^2$ so let's check that this is unbiased. Using a standard result about the expected value of a quadratic form (and noting that actually $\Omega^{-1}(I-H)$ is symmetric), $$ \text E(y^T\Omega^{-1}(I-H)y) = \text{tr}\left(\Omega^{-1}(I-H) \text{Var}(y)\right) + (\text E y)^T \Omega^{-1}(I-H) (\text E y) \\ = \sigma^2 \text{tr} \left(\Omega^{-1}(I-H)\Omega\right) + \beta^TX^T\Omega^{-1}(I-H)X\beta. $$ Using the cyclicity of the trace, that first term becomes $\sigma^2(n-p)$. For the second term, $$ X^T\Omega^{-1}(I-H)X = X^T\Omega^{-1}X - X^T\Omega^{-1}X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}X = 0 $$ so $\hat\sigma^2$ is indeed unbiased.

Now for its distribution, note that from $y = X\beta + \e$ we could left-multiply through by $\Omega^{-1/2}$ to get $$ \underbrace{\Omega^{-1/2}y}_{:= z} = \underbrace{\Omega^{-1/2}X}_{:= Z}\,\beta + \underbrace{\Omega^{-1/2}\e}_{:= \eta} $$ so this is like a homoscedastic linear model $z = Z\beta + \eta$ with $\eta \sim \mathcal N(0,\sigma^2 I)$. In this case we'd get $H_Z = Z(Z^TZ)^{-1}Z^T$ and from the usual theory (see Cochran's theorem) we get $$ z^T(I-H_Z)z / \sigma^2 \sim \chi^2_{n-p}. $$

Note that $H_Z = \Omega^{-1/2}X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1/2}$ so $$ z^T(I-H_Z)z = y^T\Omega^{-1/2}(I - H_Z)\Omega^{-1/2}y \\ = y^T \left[\Omega^{-1} - \Omega^{-1}X(X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}\right]y \\ = y^T\Omega^{-1}(I-H)y = (n-p)\hat\sigma^2 $$ so that shows both how we'd have come up with $\hat\sigma^2$ in the first place, and that $$ \frac{(n-p)\hat\sigma^2}{\sigma^2} \sim \chi^2_{n-p}. $$

This means that $$ \frac{x_0^T\hat\beta - x_0^T\beta}{\sqrt{\hat \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} = \frac{x_0^T(\hat\beta - \beta) / \sqrt{\sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1}x_0}}{\sqrt{\hat\sigma^2 / \sigma^2}} $$ is exactly of the form $\mathcal N(0,1) / \sqrt{\chi^2_d / d}$.

Independence comes from the unweighted theory, as we know $$ z^T(I-H_Z)z \perp \hat\beta_Z $$ but $$ \hat\beta_Z = (Z^TZ)^{-1}Z^Tz = (X^T\Omega^{-1}X)^{-1}X^T\Omega^{-1}y = \hat\beta $$ and $z^T(I-H_Z)z = (n-p)\hat\sigma^2$ so we don't need to do any extra work.

Thus with much ado we've proved that $$ \frac{x_0^T\hat\beta - x_0^T\beta}{\sqrt{\hat \sigma^2 x_0^T(X^T\Omega^{-1}X)^{-1} x_0}} \sim \mathcal t_{n-p} $$ so you can just use the usual $t_{\alpha/2, n-p}$ quantiles.

Thank you so much for your answer, it really helped me ! Do you know a book or a paper where I could find the equation of the confidence interval for WLS and the following proof ? — John Tokka Tacos, Oct 05 '18 at 07:37
Another question: am I right to assume that $ \hat\sigma^2 $ is the estimator for the variance in each point and thus is in Matrix form with $ size(\hat\sigma^2)=size(\Omega) $ ? — John Tokka Tacos, Oct 05 '18 at 08:05
@JohnTokkaTacos i don't know of an official reference and I don't have my usual ones handy to check, but possibly Seber and Lee, or maybe an econometrics book. And what do you mean by $size(\cdot)$? — jld, Oct 05 '18 at 14:16
Thanks, I will check the reference. My question was if $ \hat\sigma^2 $ was a scalar value or a Matrix. After further research I found that it is a scalar value. If I know my $ \Sigma=\hat\sigma^2 \cdot \Omega $ I can find my scalar value by calculating the weights with $ trace(\Omega)=N $ and $ \hat\sigma^2=\frac{1}{N} \cdot \Sigma \sigma^2_i $. Can you confirm this ? — John Tokka Tacos, Oct 05 '18 at 14:27
@JohnTokkaTacos I'm still not sure what you mean. $\hat\sigma^2$ is definitely a scalar as it comes from a quadratic form. When I write $\text{Var}(\varepsilon) = \sigma^2 \Omega$ I'm saying that the covariance structure is completely known except for a scaling parameter $\sigma^2$, so after estimating $\sigma^2$ then $\text{Var}(\varepsilon)$ is totally known (or at least, totally estimated). For a particular point $y_i$ you'd have $\text{Var}(y_i) = \sigma^2 \Omega_{ii}$ and for two points $y_i, y_j$ you'd get $\text{Cov}(y_i, y_j) = \sigma^2 \Omega_{ij}$ — jld, Oct 05 '18 at 15:06
I'll try to be more clear ! Say I have a measuring device and I measure my response variable Y as a function of an independent variable X. I measure each point several times, in order to calculate the measurement error as a function of X. Now in my case the error depends on X (the bigger X, the bigger the measurement error), thus I am doing WLS. I can obtain a vector of the different measurement errors for different values of X. This vector will be $ Var(\epsilon) $. Now I have to calculate $ \Omega $ and $ \sigma^2 $, the latter being the scaling parameter. By which method should I do this ? — John Tokka Tacos, Oct 05 '18 at 15:34
@JohnTokkaTacos $\varepsilon$ represents the error in every single observation. If you measure $y_1,\dots,y_k$ all with some noise but for the same X value, i.e. $x_1 = \dots = x_k$, you'll still have $\text{Var}(\varepsilon)$ as a $k\times k$ matrix. Also as I've emphasized in my answer, this assumes $\Omega$ is known. So like in WLS if you model $\text{Var}(y_i) = \sigma^2 x_i$ now you'd have $\Omega = \text{diag}(x_1, \dots, x_n)$ (assuming here that the $x_i$ are scalars too) so there's no estimation there. Unless you have measurement errors in X? But that's a different problem — jld, Oct 05 '18 at 18:06

Confidence Interval of the mean response for Weighted Least Squares (WLS)

1 Answers1