I was looking into ways to compare two datasets, how different are they in shape, spread, mean, variance etc and I come across the f test. If I understood correctly it uses the null hypothesis to check if two populations that are normally distributed have the same variance. What I don't understand is how is this different from just computing the variance of the two datasets and comparing it?
-
The F test does exactly what you say in your last sentence so it is rather hard to see what your problem is. – mdewey Jan 08 '18 at 16:38
-
1The F test lets you know if the difference in variance between the two datasets is significant. – carlos Jan 08 '18 at 16:48
-
@carlos: Why is this better that just computing the variance for each population and comparing it? – djWann Jan 08 '18 at 16:55
-
1@djWann the sample variances are random variables, you get an observation of these variables when compute the sample variances, to check if the difference between samples is caused by pure randomness or if its caused by real difference in variance between population you make a random variable $F = S^2_a/S^2_b$ if the value that you get computing F using the sample variances is "uncommon" you reject $H_0$ and say that variances are different. – carlos Jan 08 '18 at 17:40
-
1If your datasets are your whole populations and you are only interested in comparing how different they are in spread, then you are right that just numerically comparing their variances is enough (this is called descriptive statistics). If, on the other hand, they are two random samples out of some larger populations, then an F test is necessary (this is called inferential statistics). If the samples are not random, then even an F test may not be the right thing to do. – Zahava Kor Jan 08 '18 at 19:58
-
@ZahavaKor really easy to understand explanation. If you want please write an answer and I will be happy to accept it. – djWann Jan 09 '18 at 09:53
1 Answers
If your datasets are your whole populations and you are only interested in comparing how different they are in spread, then you are right that just numerically comparing their variances is enough (this is called descriptive statistics). In this case, it is up to you to decide if the difference between the variances is small or large - there is no mathematical way, as far as I know, to determine this. Another thing - in this case you will divide the sum of the squares of the differences by n and not by n-1 (although for a large enough n the difference in the results of dividing by n or by n-1 will be minuscule). If, on the other hand, they are two random samples out of some larger populations, then an F test is necessary (this is called inferential statistics). If the samples are not random, then even an F test may not be the right thing to do.

- 845
- 4
- 6
-
It might be helpful to point out that the F test is rather less populr these days. See for example this Q&A https://stats.stackexchange.com/questions/24022/why-levene-test-of-equality-of-variances-rather-than-f-ratio#24024 – mdewey Jan 09 '18 at 16:59