20

I was wondering if there is a relationship between $R^2$ and a F-Test.

Usually $$R^2=\frac {\sum (\hat Y_t - \bar Y)^2 / T-1} {\sum( Y_t - \bar Y)^2 / T-1}$$ and it measures the strength of the linear relationship in the regression.

An F-Test just proves a hypothesis.

Is there a relationship between $R^2$ and a F-Test?

MDEWITT
  • 123
  • 6
Le Max
  • 3,559
  • 9
  • 26
  • 26
  • 2
    The formula for $R^2$ looks incorrect, not just because it's missing some characters in the denominator: those "$-1$" terms don't belong. The correct formula looks much more like an $F$ statistic :-). – whuber Apr 22 '13 at 17:28
  • See http://stats.stackexchange.com/questions/58107/conditional-expectation-of-r-squared/58133#58133 – Stéphane Laurent May 21 '13 at 09:31

4 Answers4

29

Recall that in a regression setting, the F statistic is expressed in the following way.

$$ F = \frac{(TSS - RSS)/(p-1)}{RSS/(n-p)} $$

where TSS = total sum of squares and RSS = residual sum of squares, $p$ is the number of predictors (including the constant) and $n$ is the number of observations. This statistic has an $F$ distribution with degrees of freedom $p-1$ and $n-p$.

Also recall that $$ R^2 = 1 - \frac{RSS}{TSS} = \frac{TSS - RSS}{TSS} $$

simple algebra will tell you that $$ R^2 = 1 - (1 + F \cdot \frac{p-1}{n-p})^{-1} $$

where F is the F statistic from above.

This is the theoretical relationship between the F statistic (or the F test) and $R^2$.

The practical interpretation is that a bigger $R^2$ lead to high values of F, so if $R^2$ is big (which means that a linear model fits the data well), then the corresponding F statistic should be large, which means that that there should be strong evidence that at least some of the coefficients are non-zero.

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Zheng Li
  • 421
  • 3
  • 4
28

If all the assumptions hold and you have the correct form for $R^2$ then the usual F statistic can be computed as $F = \frac{ R^2 }{ 1- R^2} \times \frac{ \text{df}_2 }{ \text{df}_1 }$. This value can then be compared to the appropriate F distribution to do an F test. This can be derived/confirmed with basic algebra.

Greg Snow
  • 46,563
  • 2
  • 90
  • 159
  • 2
    could you please define df1 and df2? – bonobo Jan 20 '17 at 14:49
  • 1
    @bonobo, df1 is the numerator degrees of freedom (based on the number of predictors) and df2 is the denominator degrees of freedom. – Greg Snow Jan 20 '17 at 17:23
  • 3
    To clarify further about the degrees of freedom: df1=k, where k is number of predictors. df1 is called the "numerator degrees of freedom," even though it's in the denominator in this formula. df2=n−(k+1), where n is the number of observations and k is the number of predictors. df2 is called the "denominator degrees of freedom," even though it's in the numerator in this formula. – Tim Swast Feb 11 '18 at 17:38
  • 7
    @GregSnow could you consider adding the definitions for the degrees of freedom to the answer? I suggested such a change at https://stats.stackexchange.com/review/suggested-edits/175306 but it was rejected. – Tim Swast Feb 11 '18 at 17:45
4

Intuitively, I like to think that the result of the F-ratio first gives a yes-no response to the the question, 'can I reject $H_0$?' (this is determined if the ratio is much larger than 1, or the p-value < $\alpha$).

Then if I determine I can reject $H_0$, $R^2$ then indicates the strength of the relationship between.

In other words, a large F-ratio indicates that there is a relationship. High $R^2$ then indicates how strong that relationship is.

Nick Head
  • 51
  • 2
1

Also, quickly:

R2 = F / (F + n-p/p-1)

Eg, The R2 of a 1df F test = 2.53 with sample size 21, would be:

R2 = 2.53 / (2.53+19) R2 = .1175

rystoli
  • 29
  • 1