8

Let's assume that we have simple linear regression: $\hat{y} = bx + \text{intercept}$.

Is it possible to have a high p-value and high $R^2$ (or low p-value and low $R^2$)? I've been looking for examples of this. When the linear regression has multiple parameters, I saw some examples where p-value for some parameters are low, but overall $R^2$ is low as well, but I was wondering if it's possible for the linear regression of a single parameter.

Sycorax
  • 76,417
  • 20
  • 189
  • 313
user98235
  • 505
  • 4
  • 10

2 Answers2

9

Yes, it is possible. The $R^2$ and the $t$ statistic (used to compute the p-value) are related exactly by:

$ |t| = \sqrt{\frac{R^2}{(1- R^2)}(n -2)} $

Therefore, you can have a high $R^2$ with a high p-value (a low $|t|$) if you have a small sample.

For instance, take $n = 3$. For this sample size to give you a (two-sided) p-value less then 10% you would need an $R^2$ greater than 85% -- anything less than that would give you "non-significant" p-value.

As a concrete example, the simulation below produces an $R^2$ close to 0.5 with a p-value of $0.516$.

set.seed(10)
n <- 3
x <- rnorm(n, 0, 1)
y <- 1 + x + rnorm(n, 0, 1)
summary(m1 <- lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
       1        2        3 
-0.36552  0.42802 -0.06251 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.7756     0.4261    1.82    0.320
x             0.5065     0.5333    0.95    0.516

Residual standard error: 0.5663 on 1 degrees of freedom
Multiple R-squared:  0.4743,    Adjusted R-squared:  -0.05148 
F-statistic: 0.9021 on 1 and 1 DF,  p-value: 0.5164

For the opposite case (low p-value with low $R^2$), you can trivially obtain that by setting a regression where $x$ has a low explanatory power and let $n \to \infty$ to get a p-value as small as you want.

Carlos Cinelli
  • 10,500
  • 5
  • 42
  • 77
  • If x and y are uncorrelated noise, then one should have a low p value, otoh a large sample size should give a decent R^2. Shouldn't that work? – meh Jan 23 '17 at 01:29
  • @aginensky no, it shouldn't. A large sample size improves how well you estimate R^2 and not how big R^2 is. If x and y are uncorrelated, your R^2 will converge to zero as n -> infty. – Carlos Cinelli Jan 23 '17 at 01:40
  • @carloscinelli thanks for your answer. I guess assuming the sample size is sufficiently high, it's impossible to have both high R^2 and p-value at the same time for simple linear regression like this. – user98235 Jan 23 '17 at 05:02
  • 1
    @user98235 yes, and you can actually compute this exactly. For instance, if n = 102, then any R^2 > 4% will give p-values < 5%. – Carlos Cinelli Jan 23 '17 at 05:09
  • @ carloscinelli - of course, my bad ! – meh Jan 23 '17 at 10:19
4

This looks like a self-study, so I'll offer a hint: Is either or both of these measures (R-square and p-value) related to the sample size?

zbicyclist
  • 3,363
  • 1
  • 29
  • 34