1

If you regress randomly generated independent variables on a randomly generated dependent variable, is the expected R squared value simply a function of n (the # observations) and k (the # of independent variables)? If so, why is this?

In some old regression course notes I was re-reading, I see that the expected R-squared in this case is k / (n-k-1). I tried this with some randomly generated data (e.g. n=100 and k=20) and indeed got a value very close to 0.2532, but I don't understand how it can be this simple. Thanks for any color anyone might have.

huckleberry
  • 121
  • 1
  • 6
  • 1
    I cannot reproduce your formula or your simulation. For the formula I obtain $k/(n-1)$ and my simulations in `R` agree with this. Note that your formula yields nonsense whenever $k$ exceeds $n-k-1$: it gives an expected $R^2$ greater than $1$! – whuber Dec 23 '17 at 23:45
  • The math seems to be correct according to... https://stats.stackexchange.com/questions/181652/expected-value-of-r2-the-coefficient-of-determination-under-the-null-hypoth by https://stats.stackexchange.com/users/28746/alecos-papadopoulos – jeffalltogether Dec 23 '17 at 23:55
  • @jeffalltogether whuber's math is correct, indeed. And this question can be considered a duplicate. – DeltaIV Dec 24 '17 at 07:54

0 Answers0