In a linear regression context: is the sample $\widehat{R^2}$ a consistent estimator of the population parameter $R^2$? Maybe this depends on distributional assumptions?
1 Answers
As whuber noted, consistency of the $\widehat{R^2}$ should be first examined under the assumption of correct specification, as we usually do with all estimators. It is a separate matter to examine what happens to consistency under misspecification, i.e. under the inclusion of irrelevant variables or the exclusion of relevant variables, or functional misspecification.
The population $R^2$, in a $y = X\beta +u$ framework, can be defined as
$$R^2_{pop} \equiv \frac {\text{Var}(X\beta)}{\text{Var}(y)}=1-\frac{\text{Var}(u)}{\text{Var}(y)} = 1-\frac{\sigma^2}{\text{Var}(y)}$$
By writing the above, we essentially assume that $\text{Var}(y)$ exists and is finite.
The sample estimator can be written
$$\widehat{R^2} = 1- \frac{[1/(n-k)]\sum\hat u_i^2}{[1/(n-k)]\sum(y_i-\bar y_i)^2}$$
Under the standard assumptions and correct specification, $1/(n-k)]\sum\hat u_i^2 \xrightarrow{p} \sigma^2$. Also, we have assumed an i.i.d. sample, and that the variance of $y$ exists and is finite. Therefore the sample analogue of this variance will be a consistent estimator of it, $[1/(n-k)]\sum(y_i-\bar y_i)^2 \xrightarrow{p}\text{Var}(y)$.
So under correct specification and since probability limits can be entered into the expression,
$$\widehat{R^2} \xrightarrow{p} R^2_{pop}$$.
Inclusion of irrelevant variables (or exclusion of relevant variables), i.e misspecification in the regressor matrix, will affect only the error variance estimator, not the dependent variable variance estimator, since this last one is calculated using only data of $y$, not of $X$. So: whenever a misspecification will cause the error variance estimator to be inconsistent, the $\widehat{R^2}$ estimator will also be inconsistent.

- 52,923
- 5
- 131
- 241
-
+1 Would you need to invoke Slutsky to have that it's sufficient to have the numerator and denominator converge to the right things to have that the ratio will converge to the ratio of those, or is it otherwise obvious that the ratio will converge to the right thing if the numerator and denominator do? (... You might want to fix 'varinace' if you happen to edit again) – Glen_b Oct 18 '14 at 01:20
-
2@Glen_b Slutsky is not needed here, since it is concerned with the case where one of the estimators converges to a random variable. The fact that the plim of the ratio equals the ratio of the plims when both plims are constants, is a consequence of the Continuous Mapping (Mann-Wald) Theorem. – Alecos Papadopoulos Oct 18 '14 at 13:47