Prove that $P(X \le a) + P\{Y \le \frac{1}{a}\} = 1$

Question

Prove that if $X$ has the F-distribution with $(m, n)$ d.f. and $Y$ has the F-distribution with $(n, m)$ d.f., then for every $a > 0$, $$ P(X \le a) + P\left\{Y \le \frac{1}{a}\right\} = 1 $$ I tried solving $P(X \le a)$ by integration but it was too difficult for me. How do I approach questions like these in general?

Welcome to Cross Validated! Please add the self-study tag, read it’s wiki, and post what progress you’ve made so far with your homework assignment. — Dave, Nov 06 '20 at 04:10
Use the properties of the F-distribution. In particular, if $X \sim F_{m,n}$, what is the distribution of the reciprocal $Y=\frac{1}{X}$? — hard2fathom, Nov 06 '20 at 04:22

BruceET · Answer 1 · 2020-11-06T08:58:12.330

I will give you some clues toward a proof and, I hope, some idea why this equality is of interest.

Definition and simulation. First, the distribution $\mathsf{F}(m,n)$ is defined as the distribution of the ratio of two independent chi-squared random variables, each divided by its degrees of freedom (so that numerator and denominator both have mean $1).$ It follows that $Y = 1/X \sim \mathsf{F}(n,m).$

set.seed(2020)
Num = rchisq(10^6, 10)/10
Den = rchisq(10^6, 15)/15
F.rat = Num/Den
summary(F.rat)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.02225  0.65241  0.97911  1.15450  1.45158 21.39064

The graph below shows the simulated values of F.rat along with the density function of $\mathsf{F}(10, 15).$ The positions of quantiles .025 and 0.975 are shown by vertical dotted lines.

hdr="Simulated Values of F(10,15) with Density"
hist(F.rat, prob=T, br=50, col="skyblue2", ylim=c(0,.9), main=hdr)
 curve(df(x,10,15), add=T, col="brown", lwd=2, n=1001)
 abline(v=qf(c(.025,.975), 10, 15), col="red")

Motivation: Printed tables of F-distributions. Typically, printed tables of F-distribution give just a few 'percentage points' for each pair of degrees of freedom: in my table. 'percentage points' $.25, .1, .05, .025, .001,$ which cut those percentages of probability from the upper tail of each distribution. These amount to quantiles $.75, .9, .95, .975, .999.$ These vaues are chosen because of their frequent use to make confidence intervals and for tests involving F-distributions.

However, values that cut particular values from the lower tails of F-distributions are often of use, even though they are not provided in most printed F-tables. (Nowadays, this is hardly causes difficulty because statistical calculators and computer software provide almost any quantiles of practical interest.)

Suppose that for $X \sim \mathsf{F}(10, 15),$ I want to know the value $q$ such that $P(X \le k) = 0.025,$ which is not in my F-table. From R, I can find that $k = 0.2840.$

k = qf(.025, 10, 15);  k
[1] 0.2839559
pf(k, 10, 15)
[1] 0.025

Now let's look at some additional values for relevant F distributions. The following information is in my table: percentage point 0.975 is 3.52. In R, this is equivalent to the following (where some extra decimal places are available):

h = qf(.975, 15, 10);  h
[1] 3.521673
pf(h, 15, 10)
[1] 0.975

Your equation $P(X \le a) + P\left(\frac{1}{X} \le \frac 1a\right) = 1$ can be illustrated in R as follows:

pf(h, 15, 10) + pf(1/h, 10, 15);  1/h
[1] 1
[1] 0.2839559    # 1/h

Notice that the information from the second term, $P\left(\frac{1}{X} \le 1/h\right)$ is not available in a printed table because it involves the 'lower half' of $Y = 1/X \sim \mathsf{F}(10,15).$

pf(1/h, 1, 15)
[1] 0.398074

It follows that we can find the value that cuts probability $0.025$ from the lower tail of $\mathsf{F}(10,15)$ from a printed table. It can be found as the reciprocal of the value that cuts $0.025$ from the upper tail of $\mathsf{F}(15, 10).$

 1 / qf(.975, 15,10)
 [1] 0.2839559       # can be found from table
 qf(.025, 10, 15)
 [1] 0.2839559       # not explicitly printed in table

This gave me the intuition for the question but I still don't know how to proof this mathematically. — Aakash Malviya, Nov 06 '20 at 13:28
The right tail of F(m,n) tells you much about the left tail of F(n,m) because of the reciprocal relationship from the definition of an F distribution. — BruceET, Nov 06 '20 at 16:37

whuber · Answer 2 · 2021-01-30T00:03:55.830

I will share three general techniques for working with density functions that are remarkably simple and useful: normalization, scaling, and identifying symmetries. (A fourth general technique, recentering, often is useful but isn't applicable in this situation.)

By following these principles, along with a healthy application of the "principle of mathematical laziness," you can just look at a formula for the F-Ratio density and deduce the symmetry relationship expressed in the question.

Normalization

The Normalization Principle

Any function $f$ that is never negative, can be integrated, and has a finite integral $I[f]$ corresponds to a density: all we need do is divide all the values of $f$ by its integral and the result satisfies all the properties of a density.

When you are examining any mathematical expression for $f,$ then, it often is convenient just to ignore any constant multiplicative factors that may appear. ("Constant" means they don't depend on the argument of $f$ -- but often they may depend on other quantities, usually known as "parameters.") Those factors will be determined by the integral of all the other factors. Let's call this the "Normalization Principle."

In this case, one expression for the F-Ratio density (from the Wikipedia article) is

$$f(x;d_1,d_2) = \color{blue}{\frac{1}{B\left(\frac{d_1}{2},\frac{d_2}{2}\right)} \left(\frac{d_1}{d_2}\right)^{\frac{d_1}{2}}}\, x^{\frac{d_1}{2}-1}\left(1 + \frac{d_1}{d_2}x\right)^{-\frac{d_1+d_2}{2}}.\tag{1}$$

I have colored some terms that do not depend on the argument $x:$ we are free to ignore these, because they are determined by the Normalization Principle.

You have choices

There is always more than one way to partition a density's expression into a constant factor and the other factors. Indeed, you may even introduce new factors in the expression if that helps you simplify or understand it -- so long as they don't depend on the function's argument. I will illustrate this in the next section, where I expand the blue factors to include more terms that appear.

Scaling

Distributions are best analyzed by examining the entire expression that you would integrate. This expression is sometimes called the probability element: it is the product of the density $f_X(x)$ (representing the height on its graph) and the differential element $\mathrm{d}x$ (representing small distances along the base of the graph of $f_X$). The product has units of probability -- and it's the probability, not the density, that behaves most simply under analysis.

Consider, then, a probability element $f_X(x)\,\mathrm{d}x$ and suppose you were to change the units of measurement in which $x$ is expressed. For instance, if $x$ represents a length in centimeters and you prefer to work in inches, you would divide $x$ by $2.54$ centimeters per inch. This obviously doesn't change anything about any probabilities: it merely reflects how you choose to write down numbers.

Mathematically, when we divide $x$ by a positive number $\lambda$ to produce $y=x/\lambda,$ the new probability element $f_Y$ is therefore written

$$f_Y(y)\,\mathrm{d}y = f_X(x/\lambda)\,\mathrm{d}( x / \lambda ) = f_X( x)\,\mathrm{d}(\lambda x/ \lambda ) = f_X(x/ \lambda )\,\left(\mathrm{d}x\right)/ \lambda = \frac{1}{\lambda} \left(f_X(x/ \lambda )\,\mathrm{d}x\right).$$

If now, as is conventional, we omit writing the differential elements $\mathrm{d}y$ and $\mathrm{d}x,$ the preceding equation reads

$$f_Y(y) = \frac{1}{\lambda} f_X(x/ \lambda ).\tag{2}$$

The appearance of the factor of $\lambda$ is a mystery until you appreciate it is the last visible trace of the now-vanished differential element. Let's call this the "density scaling law." (Understanding how $f_X$ changes when $x$ is transformed also permits us to drop the subscripts from $f$ from now on.)

Exploit the density scaling law by working backwards. Whenever you look at the mathematical expression of a density, look for constant terms that multiply the argument: these are candidates for the scale factor $1/\lambda.$

Let's re-examine the F-Ratio density $(1)$ from this perspective. It contains two terms with "$x$" in it. Focus on the simplest one:

$$f(x;d_1,d_2) = \frac{1}{B\left(\frac{d_1}{2},\frac{d_2}{2}\right)} \left(\frac{d_1}{d_2}\right)^{\frac{d_1}{2}}\, x^{\frac{d_1}{2}-1}\left(1 + \color{red}{\frac{d_1}{d_2}x}\right)^{-\frac{d_1+d_2}{2}}.$$

There it is in red, all by itself, multiplied by a constant $d_1/d_2 = 1/\lambda.$ What you want to do is replace $x/\lambda$ by $y$ or, equivalently, just plug $\lambda y = d_2/d_1 y$ into the formula everywhere in place of $x.$ In so doing you will be neglecting that ghostly $\lambda$ factor that appeared in $(2),$ but the Normalization Principle tells us this doesn't matter.

This is a perfectly routine substitution, requiring only basic algebra to simplify:

$$\begin{aligned} f(y;d_1,d_2) &\propto \frac{1}{B\left(\frac{d_1}{2},\frac{d_2}{2}\right)} \left(\frac{d_1}{d_2}\right)^{\frac{d_1}{2}}\, (d_2/d_1\, y)^{\frac{d_1}{2}-1}\left(1 + \color{red}{\frac{d_1}{d_2}(d_2/d_1\, y)}\right)^{-\frac{d_1+d_2}{2}}\\ &= \color{blue}{\frac{1}{B\left(\frac{d_1}{2},\frac{d_2}{2}\right)} \left(\frac{d_1}{d_2}\right)^{\frac{d_1}{2}}\, \left(\frac{d_2}{d_1}\right)^{\frac{d_1}{2}-1}}\, y^{\frac{d_1}{2}-1}\left(1 + \color{red}{y}\right)^{-\frac{d_1+d_2}{2}} \end{aligned} .$$

Although there is an obvious nice cancellation of the fractions involving $d_1/d_2$ and $d_2/d_1,$ I didn't bother to go further with the simplification because the Normalization Principle tells us we don't have to do any more work: these new factors merely contribute to the multiplicative "fluff" that we may freely ignore. That's why I am showing all of it in blue.

Notice the appearance at the outset of the "$\propto$" ("is proportional to") relation: it acknowledges that I neglected to divide the right hand side by $\lambda = d_2/d_1.$

These last two steps are examples of what I call the Principle of Mathematical Laziness: don't do calculations until you have to, because often you will discover you don't have to do them at all.

Invoking the Normalization Principle once more, let's look at what's left of $f:$

$$f(y;d_1,d_2)\ \propto\ y^{\frac{d_1}{2}-1}\left(1 + y\right)^{-\frac{d_1+d_2}{2}}.\tag{3}$$

Let's call this the "nucleus" of $f:$ any factors multiplying its argument $y$ have been stripped away and any constant factors multiplying its value have been ignored. Typically, you can just look at the formula of a density and see its nucleus. It usually takes no work at all.

Symmetry

The appearance of $-1$ in the power of $y$ in the nucleus $(3),$ along with the fact that $f$ is supported on the positive numbers $(0,\infty),$ hints at a multiplicative symmetry. (For more on the $-1$ terms in the powers, see https://stats.stackexchange.com/a/263842/919.)

Symmetry is a powerful, general concept that I can only hint at here by showing how it applies to the F-Ratio distribution.

The beautiful thing about the differential element $\mathrm{d}y/y$ is how it transforms when we take powers of $y:$

$$\frac{\mathrm{d}(y^p)}{y^p} =\frac{py^{p-1}\mathrm{d}y}{y^p} = p\frac{\mathrm{d}y}{y}\ \propto\ \pm\frac{\mathrm{d}y}{y}.$$

This is, after invoking the Normalization Principle, $\mathrm{d}y/y$ does not change when we take a positive power of $y$ and otherwise, for negative powers, it is merely negated. The negation reflects the fact that transforming $y$ to a negative power of $y$ reverses order: when $x \gt y \gt 0$ and $p\lt 0,$ $0 \lt x^p \lt y^p.$ We need to keep track of any such reversals because they subtly change the interpretation of $f:$ more about this at the end.

Take the appearance of "$-1$" in the power of $y$ in the nucleus $(3)$ as an invitation to rewrite the density as a differential element in terms of $\mathrm{d}y/y.$ This is the final simplification:

$$f(y;d_1,d_2)\,\mathrm{d}y\ \propto\ y^{\frac{d_1}{2}}\left(1 + y\right)^{-\frac{d_1+d_2}{2}}\,\frac{\mathrm{d}y}{y}.\tag{4}$$

Upon applying the transformation $y = u^{-1}$ in the question, without any work at all we immediately see

$$f(u;d_1,d_2)\,\mathrm{d}u\ \propto\ -(u^{-1})^{\frac{d_1}{2}}\left(1 + u^{-1}\right)^{-\frac{d_1+d_2}{2}}\,\frac{\mathrm{d}u}{u} = \frac{(u^{-1})^{\frac{d_1}{2}}}{\left(1 + u^{-1}\right)^{\frac{d_1+d_2}{2}}}\,\left(-\frac{\mathrm{d}u}{u}\right).$$

Simplifying the fraction (in the usual manner) by multiplying numerator and denominator by a suitable power of $u$ (namely, the $(d_1+d_2)/2$ power) yields

$$f(u;d_1,d_2)\,\mathrm{d}u\ \propto\ \frac{u^{\frac{d_2}{2}}}{\left(1 + u\right)^{\frac{d_1+d_2}{2}}}\,\left(-\frac{\mathrm{d}u}{u}\right) =u^{\frac{d_2}{2}}\left(1 + u\right)^{-\frac{d_1+d_2}{2}}\,\left(-\frac{\mathrm{d}u}{u}\right).\tag{5}$$

Comparing $(5)$ to $(4)$ shows that upon transforming $y$ to $1/y,$ the kernel with parameters $(d_1,d_2)$ becomes the kernel with parameters $(d_2,d_1)$ with a negative sign.

Conclusion

The appearance of the negative sign means that at some point we transformed the variables in an order-reversing way. Originally, to compute the probability of an interval $(-\infty, a]$ we would have integrated the probability element over the interval $(0, a].$ Because of the transformations we made, we must transform its endpoint $a$ appropriately to some new value $a^\prime$ and, due to the order reversal, we must now integrate over the interval $[a^\prime, \infty).$

Because the integral will be the same whether or not we include $a^\prime$ in that interval, the conclusion in the question follows immediately: when $X$ has an F-Ratio distribution with parameters $d_1,d_2$ we may compute $\Pr(X\le a)$ either by integrating $f_X$ from $0$ to $a$ or by integrating $f_U = f_{1/X},$ which has an F-Ratio distribution with parameters $d_2,d_1,$ from $a^\prime = 1/a$ to $\infty,$ which of course is just $1 - \Pr(U \le 1/a).$

If you would like to practice these techniques further, virtually every distribution (over 40 of them) in the Wikipedia list of continuous distributions supported on semi-infinite intervals) can be analyzed in the same way.

Prove that $P(X \le a) + P\{Y \le \frac{1}{a}\} = 1$

2 Answers2

Normalization

Scaling

Symmetry

Conclusion

Linked