What is estimated by $R^2$?

Question

The coefficient of determination, $R^2$, is an empirical quantity. What population quantity does it estimate and are there other estimators for this quantity? I am particularly interested in the fixed design, and not the multiple correlation coefficient, which corresponds to a random design.

References appreciated.

What do you mean, precisely, when you say $R^2$ is an *empirical* quantity? — AdamO, Aug 05 '15 at 19:52
@AdamO That means $R^2$ is being considered as a descriptive property of data (in contradistinction to an analogous property of any linear model whose response has a finite variance). — whuber, Aug 05 '15 at 21:07
@whuber I think "empirical" is used (here) to mean "representative" or simple random sample as opposed to a stratified or blocked experimental design. However, empirical estimators of quantities such as U statistics, or jackknife/bootstrap variance are still useful even for fixed designs. Generally speaking, in either type of study, the $X$s are considered fixed/given, so I'm not really catching wind of the distinction. — AdamO, Aug 05 '15 at 21:36
@AdamO A quick Google search (for "empirical quantity") turned up a book *Statistics* by H. T. Hayslett with a clear definition: "...we will test a hypothesis about a theoretical quantity whose value is unknown. This ... is known as a **parameter** ... The hypothesis about the population quantity will be tested by means of a sample quantity (or empirical quantity) known as a **statistic.** A statistic is some quantity which is calculated from the observations composing a sample." This usage accords with my understanding of "empirical": it distinguishes between a statistic and a parameter. — whuber, Aug 05 '15 at 21:55

Michael M · Answer 1 · 2015-08-05T21:14:05.937

The sample R-squared is an estimator of the true R-squared $\theta$, which is the true proportion of variance (of the response) explained by variation in the regressors.

The sample R-squared is usually slightly positively biased, which is problematic only in very small samples (if you have $n$ regressors, it is automatically 1 - even if there is no true linear relation to the response). The larger the sample, the smaller the bias.

An alternative estimator of $\theta$ is given by the R-squared adjusted, which is unbiased at least in the special case of no true linear relation between regressors and response.

I am looking for some quantification of the phenomena you mention. — JohnRos, Aug 06 '15 at 12:56

What is estimated by $R^2$?

1 Answers1

Linked