2

As far as I understand, a high F-stat leads to a high $R^2$, though the converse is not true. What does it mean if I have a high F-stat and a low $R^2$?

Firebug
  • 15,262
  • 5
  • 60
  • 127
pythonuser
  • 103
  • 11

2 Answers2

9

I also added a longer explanation, beginning with a introduction to these concepts, and citing this question at my website.


The F-statistic between two models, the null model (intercept only) $m_0$ and the alternative model $m_1$ ($m_0$ is nested within $m_1$) is:

$$F = \frac{\left( \frac{RSS_0-RSS_1}{p_1-p_0} \right)} {\left( \frac{RSS_1}{n-p_1} \right)} = \left( \frac{RSS_0-RSS_1}{p_1-p_0} \right) \left( \frac{n-p_1}{RSS_1} \right)$$

$R^2$ on the other hand, is defined as:

$$ R^2 = 1-\frac{RSS_1}{RSS_0} $$

Rearranging $F$ we can see that: $$F = \left( \frac{RSS_0-RSS_1}{RSS_1} \right) \left( \frac{n-p_1}{p_1-p_0} \right) = \left( \frac{RSS_0}{RSS_1}-1 \right) \left( \frac{n-p_1}{p_1-p_0} \right) = \left( \frac{R^2}{1-R^2} \right) \left( \frac{n-p_1}{p_1-p_0} \right)$$

It suffices that $n$ is large to attain a large $F$. Or, in other words, the relationship between $F$ and $R^2$ is not as straightforward as one might think and, with sufficiently large $n$, there is almost always enough power to reject the null hypothesis.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Firebug
  • 15,262
  • 5
  • 60
  • 127
3

This can happen when you reject the null that the coefficients of your model (excluding the intercept) are 0 but the variance of the reiduals is still very large. Here is an example

set.seed(0)
N = 1000000
x = rnorm(N, 0, 10)
y = 0.01*x + 0.02 + rnorm(N, 0, 10)

s = summary(lm(y~x))
s$fstatistic
s$r.squared

If you run this code, you will find the F statistic is 105 but the r squared is < 0.0001. We have plenty of data to truly detect that the coefficient for x is not 0, but the residual variance is not much different that the marginal variance of y, leading to small r squared.

Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94