What does it mean if I have a high F-stat but low $R^2$?

Question

As far as I understand, a high F-stat leads to a high $R^2$, though the converse is not true. What does it mean if I have a high F-stat and a low $R^2$?

score 9 · Accepted Answer · edited Dec 14 '20 at 15:32

I also added a longer explanation, beginning with a introduction to these concepts, and citing this question at my website.

The F-statistic between two models, the null model (intercept only) $m_0$ and the alternative model $m_1$ ($m_0$ is nested within $m_1$) is:

$$F = \frac{\left( \frac{RSS_0-RSS_1}{p_1-p_0} \right)} {\left( \frac{RSS_1}{n-p_1} \right)} = \left( \frac{RSS_0-RSS_1}{p_1-p_0} \right) \left( \frac{n-p_1}{RSS_1} \right)$$

$R^2$ on the other hand, is defined as:

$$ R^2 = 1-\frac{RSS_1}{RSS_0} $$

Rearranging $F$ we can see that: $$F = \left( \frac{RSS_0-RSS_1}{RSS_1} \right) \left( \frac{n-p_1}{p_1-p_0} \right) = \left( \frac{RSS_0}{RSS_1}-1 \right) \left( \frac{n-p_1}{p_1-p_0} \right) = \left( \frac{R^2}{1-R^2} \right) \left( \frac{n-p_1}{p_1-p_0} \right)$$

It suffices that $n$ is large to attain a large $F$. Or, in other words, the relationship between $F$ and $R^2$ is not as straightforward as one might think and, with sufficiently large $n$, there is almost always enough power to reject the null hypothesis.

score 3 · Answer 2 · answered Oct 08 '20 at 17:07

This can happen when you reject the null that the coefficients of your model (excluding the intercept) are 0 but the variance of the reiduals is still very large. Here is an example

set.seed(0)
N = 1000000
x = rnorm(N, 0, 10)
y = 0.01*x + 0.02 + rnorm(N, 0, 10)

s = summary(lm(y~x))
s$fstatistic
s$r.squared

If you run this code, you will find the F statistic is 105 but the r squared is < 0.0001. We have plenty of data to truly detect that the coefficient for x is not 0, but the residual variance is not much different that the marginal variance of y, leading to small r squared.

What does it mean if I have a high F-stat but low $R^2$?

2 Answers2

Linked

Related