0

The coefficient of determination, $R^2$, is an empirical quantity. What population quantity does it estimate and are there other estimators for this quantity? I am particularly interested in the fixed design, and not the multiple correlation coefficient, which corresponds to a random design.

References appreciated.

JohnRos
  • 5,336
  • 26
  • 56
  • What do you mean, precisely, when you say $R^2$ is an *empirical* quantity? – AdamO Aug 05 '15 at 19:52
  • 1
    @AdamO That means $R^2$ is being considered as a descriptive property of data (in contradistinction to an analogous property of any linear model whose response has a finite variance). – whuber Aug 05 '15 at 21:07
  • @whuber I think "empirical" is used (here) to mean "representative" or simple random sample as opposed to a stratified or blocked experimental design. However, empirical estimators of quantities such as U statistics, or jackknife/bootstrap variance are still useful even for fixed designs. Generally speaking, in either type of study, the $X$s are considered fixed/given, so I'm not really catching wind of the distinction. – AdamO Aug 05 '15 at 21:36
  • 1
    @AdamO A quick Google search (for "empirical quantity") turned up a book *Statistics* by H. T. Hayslett with a clear definition: "...we will test a hypothesis about a theoretical quantity whose value is unknown. This ... is known as a **parameter** ... The hypothesis about the population quantity will be tested by means of a sample quantity (or empirical quantity) known as a **statistic.** A statistic is some quantity which is calculated from the observations composing a sample." This usage accords with my understanding of "empirical": it distinguishes between a statistic and a parameter. – whuber Aug 05 '15 at 21:55

1 Answers1

1

The sample R-squared is an estimator of the true R-squared $\theta$, which is the true proportion of variance (of the response) explained by variation in the regressors.

The sample R-squared is usually slightly positively biased, which is problematic only in very small samples (if you have $n$ regressors, it is automatically 1 - even if there is no true linear relation to the response). The larger the sample, the smaller the bias.

An alternative estimator of $\theta$ is given by the R-squared adjusted, which is unbiased at least in the special case of no true linear relation between regressors and response.

Michael M
  • 10,553
  • 5
  • 27
  • 43