4

I've seen the homoskedasticity assumption stated as the constant conditional variance of the error (i.e., Var(u|x)=constant).

I was wondering if I can also state the homoeskedasticity assumption as constant variance across values of a same dependent variable, conditional on the independent variable (i.e., Var(y|x)=constant)?

I remember a professor one time told me that they were essentially the same thing and you could state one or the other and they meant the same. However, I have been looking around the internet and cannot find info on the matter.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
StatsScared
  • 1,048
  • 12
  • 26
  • 3
    If you define the "error" as $U = Y - f(x)$ for the (correct but unknown) model $f$, how do you suppose the variance of $Y|X$ could differ from that of $U|X$? Is this indeed what you mean by "error"? – whuber Jul 28 '14 at 18:29

3 Answers3

3

Given the usual linear regression model, (vector matrix notation for a sample of size $n$)

$$\mathbf y = \mathbf X\beta + \mathbf u$$

where $\mathbf u$ is an unknown stochastic "error/disturbance", we make various (and varying) additional a priori assumptions and for each set of them we examine what properties do various estimators have.

The OP refers to the "conditional homoskedasticity of the error" assumption, where the distribution of $\mathbf u$ conditional on the regressor matrix $X$ is assumed constant:

$$\operatorname{Var}(\mathbf u\mid \mathbf X) = \sigma^2\mathbf \Omega$$

where the diagonal elements of $\Omega$ are equal to unity (the off-diagonal elements can be non-zero, we do not treat the issue of autocorrelation here-note that the "variance" of a vector essentially denotes the variance-covariance matrix of the vector) This means that for every $i=1,...,n$ the conditional variance of the error term is the same (which we summarily state by saying that it is "constant").

Now, under the assumed specification we also have

$$\operatorname{Var}(\mathbf y\mid \mathbf X) = \operatorname{Var}(\mathbf X\beta + \mathbf u\mid \mathbf X)$$

Since we condition on $\mathbf X$, $\mathbf X$ is treated as a constant. moreover, in the context of classical/frequentist statistics, the unknown coefficient vector $\beta$ is also treated as a constant. So $\mathbf X\beta$ is a constant (conditionally on $\mathbf X$), and therefore it does not affect the conditional variance. So

$$\operatorname{Var}(\mathbf y\mid \mathbf X) = \operatorname{Var}(\mathbf X\beta + \mathbf u\mid \mathbf X) = \operatorname{Var}(\mathbf u\mid \mathbf X)$$

So the answer is "yes" (note that conditional homoskedasticity implies unconditional homoskedasticity, but not vice-versa).

In fact the classical linear regression model with the benchmark set of assumptions of "spherical disturbances" (conditionally homoskedastic and non-autocorrelated), and the extension of regressors being stochastic but strictly exogenous, can be compactly specified without an error term in sight:

$$\begin{align} &E(\mathbf y \mid\mathbf X) = \mathbf X \beta\\ &\operatorname{Var}(\mathbf y\mid \mathbf X) = \sigma^2\mathbf I\\ &\mathbf X \;\text {is of full column rank} \end{align}$$

The first line incorporates the linear specification assumption, and the assumption that anything else that may affect $\mathbf y$ has an expected value equal to $0$, conditional on $\mathbf X$. The second line incorporates the assumption about the "error term" being conditionally homoskedastic and non-autocorrelated. The last line is the "no-perfect collinearity of the regressors" assumption.

Alecos Papadopoulos
  • 52,923
  • 5
  • 131
  • 241
2

From Jeffrey Wooldridge's Introductory Econometrics: A Modern Approach:

When Var(u|x) depends on x, the error term is said to exhibit 
heteroskedasticity (or nonconstant variance). 
Because Var(u|x) = Var(y|x), heteroskedasticity is present
whenever Var(y|x) is a function of x.

... Just throwing this out there for those looking for a less technical answer than the previous two...

Steve S
  • 1,064
  • 8
  • 17
1

As @whuber hints, these are not quite the same, but in practice we assess the conditional variance of the response variable ${\rm Var}(y|x)$. The key issue is that of the errors versus the residuals. It is a very subtle topic. The errors are the deviations of observed data from the expected values, $y_i - {\rm E}[y_i|x_i]$, whereas the residuals are the differences between the observed values and the model's predicted values, $y_i - \hat y_i$. Importantly, $\hat y_i$ only necessarily equals ${\rm E}[y_i|x_i]$ 'at' infinity (and even then only if the estimator is consistent), in the short run, they are almost certainly different. Thus, the residuals you have are not identical to the true errors. Moreover, the assumption of homoscedasticity pertains to the errors, not the residuals.

However, when we want to check the assumption, we don't have access to the true errors, so we use the residuals instead. (Even this part is more complicated, because the observations typically have different amounts of leverage and so the residuals don't all have the same standard deviation. As a result, we don't check the raw residuals that I mentioned above, but the standardized residuals instead. For more about that it may help to read my answer here: Interpreting plot.lm.)

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
  • 1
    +1 for making a clear and useful distinction. However, it is difficult to see how a model assumption could apply to the *residuals* whose probability distribution, after all, depends on the very method used to estimate the model. As far as I can tell, about the only sensible way to interpret the homoskedasticity assumption is in terms of the errors. Assuming the errors are *additive,* it is immediate that their variance equals the conditional variance of the response variable. – whuber Jul 28 '14 at 20:22
  • @whuber, you're right. I wasn't being clear about the nature of the assumption. I added a few more details. I hope there isn't room for misunderstanding now. – gung - Reinstate Monica Jul 29 '14 at 02:34