2

In order to assess the "correctness" of a specified regression model, one can compute the variance of the residuals and compare it to the uncertatinty $\sigma$ of our data. To do so, it is useful to work with standardized/studentized Residuals.

Assuming Homoskedasticity we get:

Standardized Residuals: $$ R_{i,Stand} = \frac{R_i}{\hat{\sigma}}$$ Internally studentized Residuals: $$ R_{i,ExtStud} = \frac {R_i}{var(R_i)} = \frac{R_i}{\sqrt{\hat{\sigma}^2 (1 - H_{ii})}}$$

Externally studentized Residuals (follow a t-distribution): $$ R_{i,IntStud} = \frac{R_i}{\sqrt{\hat{\sigma}_{(i)}^2 (1 - H_{ii})}} \sim t(n-p-1) \text{-distribution}$$

Now I found some literature stating that the $Var(R_{Stand})$ should equal 1, and some literature stating that $Var(R_{IntStud})$ or $Var(R_{ExtStud})$ should equal 1.

Which one is it ?

  • plug both to the variance equation and see, think of what is $\hat\sigma$ – Aksakal May 15 '20 at 12:51
  • You mean $Var(X) = E[X^2] - E[X]^2$ ? I know that $\hat{\sigma} = \sqrt{\frac{\sum R_i^2}{\nu}}$ and $\frac{(n-p-1)\hat{\sigma} }{\sigma^2}\sim \chi^2_{n-p-1}.$ But I am stuck at this point... – John Tokka Tacos May 15 '20 at 20:56

2 Answers2

1

If you're talking about the variance of the random variable (taken with the explanatory variables fixed and the response as a random variable) then none of these have exactly unit variance. What we can legitimately say under the linear regression model with OLS etimation is that:

$$\begin{align} \mathbb{V}(R_i | \mathbf{x}) &= \sigma^2 (1-h_{i,i}), \\[6pt] \mathbb{E}(\hat{\sigma}^2 | \mathbf{x}) &= \sigma^2, \\[6pt] \mathbb{E}(\hat{\sigma}_{(i)}^2 | \mathbf{x}) &= \sigma^2. \\[6pt] \end{align}$$

Moreover, under broad limiting conditions we also know that there is convergence $\hat{\sigma}_{(i)}^2 \rightarrow \hat{\sigma}^2 \rightarrow \sigma^2$ as $n \rightarrow \infty$. From these results we can show that the variance of the standardised/studentised residuals approaches one asymptotically under the required limiting conditions. However, from these results, it does not follow that any of the ratios you give have exact unit variance for any finite $n$.

In fact, the only case where we can calculate an exact distribution (and an exact variance) that is independent of the explanatory variables is for the externally studentised residuals,$^\dagger$ where we have:

$$R_{i,\text{ExtStud}} = \frac{R_i}{\sigma_{(i)} \sqrt{1-h_{i,i}}} \sim \text{St}(n-p-1).$$

In this case, using the variance of the Student's T distribution we have the exact variance:

$$\mathbb{V}(R_{i,\text{ExtStud}} | \mathbf{x}) = \frac{n-p-1}{n-p-3} \quad \quad \quad \quad \quad \text{if } n-p>3.$$

We can see that $\mathbb{V}(R_{i,\text{ExtStud}} | \mathbf{x}) \rightarrow 1$ as $n \rightarrow \infty$, but the variance is not one for any finite $n$. In the other two cases the exact conditional distribution of the residual (given the explanatory variables) is complicated and depends on the explanatory variables. However, as previously noted, we have convergence to unit variance in all cases under broad limit conditions.


$^\dagger$ You have your notation for internal and external studentisation around the wrong way; it is external if the variance is estimated without use of the present data point.

Ben
  • 91,027
  • 3
  • 150
  • 376
0

When you standardize, it's a mechanical operation, there's no $\chi^2$ and other fun stuff. It's just $r_i=\frac{R_i-\bar R_i}{s^2}$, hence, if you take a variance of this thing, then you get $var[r_i|R]=\frac{var{R_i|R}}{s^2}=\frac{s^2}{s^2}=1$

Here, $s^2$ is a sample variance, and although it's in denominator, it does NOT lead to $\chi^2$ because it's in a conditional expression. It's great because for $\chi^2$ you need normal assumptions, and here we don't need any distributional assumption whatsoever. We don't even need the formulae I have given here, because you can derive the same result from simply arithmetic, no probability theory is needed at all

Aksakal
  • 55,939
  • 5
  • 90
  • 176