When conducting various statistical test why do we expect equality of variances/ homoscedasticity/sphericity etc.?
-
4The "most natural" null hypothesis is that the compared (or implicitly compared, as in regression) samples come from the _same_ population. Hence under this H0 all their distributional parameters are assumed to be equal. – ttnphns May 15 '13 at 06:37
-
Relevant: [Unequal variance in randomized experiments to compare treatment with control?](https://stats.stackexchange.com/questions/434928/unequal-variance-in-randomized-experiments-to-compare-treatment-with-control) – kjetil b halvorsen Dec 25 '21 at 02:12
3 Answers
If you're talking about statistical tests of variance between two or more populations: I don't think we do, but I think we simply state such in the null hypothesis so we can calibrate a statistical test for the alternative, as is the common case in NHST. If indeed, we claim that $\mathcal{H}_0: \sigma^2_1 = \sigma^2_2$ then we can go on to devise a family of statistical tests based on a test statistic: $T=f(\hat{\sigma}^2_1, \hat{\sigma}^2_2)$ which has a known limiting distribution under $\mathcal{H}_0$.
On the other hand, if you refer to the "expectation" of constant variance such as it is stated as an assumptions in linear regression or the t-test, then I would agree with you that that is a ludicrous assumption and you would do better to favor robust alternatives.
The T-test is of correct size when using the Satterthwaite effective degrees of freedom and it is a very powerful and robust test regardless of the sampling distribution of the data in either group. I'm convinced the only reason that introductory statistics courses teach the equal variance case is for heuristic purposes, and not because it's actually practical.
In linear regression, having non-constant variance means that the standard error of the estimated model coefficients are incorrect. These can be corrected by using heteroscedasticity consistent (e.g. robust, e.g. Huber-White) "sandwich based" standard error estimates. This is only inefficient in some very small sample sizes, but in general, robust standard error estimates are far superior to model based estimates both for their larger array of applications and improved interpretation of model coefficients.

- 52,330
- 5
- 104
- 209
-
+1, this is a very good & very comprehensive answer. I must note that your discussion of constant variance for the t-test is too strong for my taste, though. I don't see it as impossible or "ludicrous" that the variance remain constant, even if there are many occasions where it would certainly change if the mean changes. – gung - Reinstate Monica May 15 '13 at 14:29
I don't think equality of variance is natural in all circumstances; indeed we know many situations where it isn't the case.
However, in situations where a shift in the mean doesn't otherwise affect the distribution of the values, then you would expect to see it.
The assumption is sometimes one of convenience - the distribution of test statistics under the null, properties useful for computing confidence intervals and so on tend to be more tractable.

- 257,508
- 32
- 553
- 939
This stems from your basic regression model.
Two things.
1:
In the context of regression we assume we have a perfect, complete model ready to go which explains, correctly, all covariability in the observed data for all variables.
Implicit with this is that the model is assumed constant for all observed data. Intuitively we tend to think of a time series but the really basic OLS model is mostly assuming cross sectional data.
As a consequence the error term which is left is nonsystematic. So it is not assumed to be some sort of bias or measurement error because a perfect model would correct that (the argument is actually not that the model could, but that it just doesn't exist and the model is therefore right, but anyway).
So let's think about this: Why should a truly random, nonsystematic error be heterscedatic? At the very least this would imply some sort of "system" to the error. It can not be from the measurement, it can not be anything we have control over. We assume we have all differences of the data points included in our model. Therefore the consequence is that all which is left is that which we can not account for, a truly random influence to this cross sectional event. This should, intuitively, be the same process for all data points.
An example: We put different animals into our measurement machine and measure. All external influences are identical, we account for all differences between the animals themselves. What is then left? It must be the same identical random process inherit to the measurement procedure. If there was, for example, a slight degradation of our sensors after 100 measurements which would increase the variance, we literally assume our model includes this.
This is of course completely bonkers in reality and there is no reason to assume this at all. But it is the mathematical argument behind this.
So in a perfect world, yeah, homoscedasticity makes sense.
2:
Obvious reason is that is makes learning the basics of regression, intervals and testing (F/t/etc.) easy. How are you going to learn about hypothesis tests if you have at the same understand a White HK estimator...
It also goes to show a basic premise in OLS which is often forgotten: It is assumed the model is truly correct. As this goes so completely against the reality there are many adjustments to be made, such as heteroscedastic error terms, but the ideal theory assumes this. In practice, many people just use these tests without understanding the implications...

- 1,152
- 5
- 7
-
You could say this is a somewhat philosophic point though, as even the math mechanics of OLS don't really work even theoretically to include all this data. So the argument above is technically reversed: The model is correct because we just assume it, NOT because we assume it CAN include all variability. The logical conclusion is of course the same. – IMA May 15 '13 at 08:58