When should an F-test be one sided vs two sided?

Question

If we increase both numerator and denominator degrees of freedom of the F distribution, then the pdf narrows down on the value 1.

This suggests that a two-sided test is reasonable: If we have high degrees of freedom, then finding e.g. our test statistic $F=0.034$ is extremely unlikely: not only high, but also low values are unlikely in this case.

Yet I have so far not yet a textbook use a two sided test of the F test. Instead, we reject the null hypothesis only when the test statistic is very large.

Why don’t textbooks use a two sided test?
When is a two sided test reasonable for the F test, and when is a one sided test reasonable?

The upper tail distribution is used for omnibus tests for difference, but the lower tail distribution is used for omnibus tests of equivalence. — Alexis, May 16 '18 at 16:16

score 3 · Answer 1 · answered May 16 '18 at 21:05

3

In the context of ANOVA, the F ratio puts between variance in the numerator and within in the denominator. You only care if the ratio is > 1 (if it were not, you’d need not even conduct any test). Therefore you always have a directional hypothesis and thus a 1-tailed test. Even in other uses of the F ratio, it is standard to put the larger variance in the numerator, so again it becomes 1-tailed.

answered May 16 '18 at 21:05

HEITZ

1,682
7
15

1

Simple and straightforward answer. No unnecessary gobbledygook. – john Jun 26 '20 at 15:54

score 1 · Answer 2 · answered May 17 '18 at 01:18

Every classical hypothesis test has a test statistic, and an implicit ordering on what values of the test statistic constitute more or less evidence for the alternative hypothesis. To understand the appropriate "tail area" for your critical region or p-value, you need to understand the ordering-of-evidence inherent in the test statistic. The test statistic and its (often implicit) ordering effectively defines a total order on the set of all possible observed outcomes, which tells you what outcomes constitute more or less evidence for the alternative hypothesis.

In hypothesis tests where the null-distribution is an F-distribution (i.e., the test statistic has an F-distribution, conditional on the null hypothesis being true), it is usually the case that the test statistic is a positive measure of deviation away from the outcome least conducive to the alternative hypothesis. In these cases a test statistic of zero represents the least possible evidence for the alternative, and higher values represent more evidence for the alternative. In this case, the p-value of the test is obtained solely from the upper tail of the F-distribution (i.e., outcomes that are at least as conducive to the alternative hypothesis as some cut-off outcome) and so the test is "one-sided".

An important thing to remember in hypothesis testing is that the test statistic and its null distribution are not sufficient to describe the test. You also need to know the evidentiary ordering of the test statistic that describes whether an outcome is more or less conducive to the alternative. Often this is not stated explicitly, and you need to look at the nature of the test statistic to figure it out.

Thank you. Let’s say though that we are doing some test where a very high F value is conducive to the alternative hypothesis, but an F very close to zero is not, but that both degrees of freedom are very large, so that a value of F=0.000000001 is very unlikely under the null but is also not conducive to the aternative. Then surely, if we find F=0.00000001, we would be wise to consider that maybe a third hypothesis is correct, and both the null and the alternative are incorrect? A reasonable statistician would surely try to find out what that third hypothesis might be? — user56834, May 17 '18 at 06:47
You aren't allowed to change your hypotheses mid-way through a hypothesis test, so no "third hypothesis" is relevant. If low values of the test statistic F are less conducive to the specified alternative hypothesis (as you have described) then a low observed value will yield a high p-value (calculated using only the upper tail) and you will not reject the null. If you then decide that a different kind of test is warranted, that is a separate issue. — Ben, May 17 '18 at 07:03

When should an F-test be one sided vs two sided?

2 Answers2