Conditional expectation of R-squared

Question

Consider the simple linear model:

$$\pmb{y}=X'\pmb{\beta}+\epsilon$$

where $\epsilon_i\sim\mathrm{i.i.d.}\;\mathcal{N}(0,\sigma^2)$ and $X\in\mathbb{R}^{n\times p}$, $p\geq2$ and $X$ contains a column of constants.

My question is, given $\mathrm{E}(X'X)$, $\beta$ and $\sigma$, is there a formula for a non trivial upper bound on $\mathrm{E}(R^2)$*? (assuming the model was estimated by OLS).

*I assumed, writing this, that getting $E(R^2)$ itself would not be possible.

EDIT1

using the solution derived by Stéphane Laurent (see below) we can get a non trivial upper bound on $E(R^2)$. Some numerical simulations (below) show that this bound is actually pretty tight.

Stéphane Laurent derived the following: $R^2\sim\mathrm{B}(p-1,n-p,\lambda)$ where $\mathrm{B}(p-1,n-p,\lambda)$ is a non-central Beta distribution with non-centrality parameter $\lambda$ with

$$\lambda=\frac{||X'\beta-\mathrm{E}(X)'\beta1_n||^2}{\sigma^2}$$

So

$$\mathrm{E}(R^2)=\mathrm{E}\left(\frac{\chi^2_{p-1}(\lambda)}{\chi^2_{p-1}(\lambda)+\chi^2_{n-p}}\right)\geq\frac{\mathrm{E}\left(\chi^2_{p-1}(\lambda)\right)}{\mathrm{E}\left(\chi^2_{p-1}(\lambda)\right)+\mathrm{E}\left(\chi^2_{n-p}\right)}$$

where $\chi^2_{k}(\lambda)$ is a non-central $\chi^2$ with parameter $\lambda$ and $k$ degrees of freedom. So a non-trivial upper bound for $\mathrm{E}(R^2)$ is

$$\frac{\lambda+p-1}{\lambda+n-1}$$

it is very tight (much tighter than what I had expected would be possible):

for example, using:

rho<-0.75
p<-10
n<-25*p
Su<-matrix(rho,p-1,p-1)
diag(Su)<-1
su<-1
set.seed(123)
bet<-runif(p)

the mean of the $R^2$ over 1000 simulations is 0.960819. The theoretical upper bound above gives 0.9609081. The bound seems to be equally precise across many values of $R^2$. Truly astounding!

EDIT2:

after further research, it appears that the quality of the upper bound approximation to $E(R^2)$ will get better as $\lambda+p$ increases (and all else equal, $\lambda$ increases with $n$).

$R^2$ has a Beta distribution with parameters depending only on $n$ and $p$. No ? — Stéphane Laurent, May 04 '13 at 18:46
Oooppss sorry, my previous claim is true only under the hypothesis of the "null model" (intercept only). Otherwise the distribution of $R^2$ should be something like a noncentral Beta distribution, with a noncentrality parameter involving the unknown parameters. — Stéphane Laurent, May 04 '13 at 18:56
@StéphaneLaurent: thanks. Would you know more about the relationship between the unknown parameters and the parameters of the Beta? I'm stuck, so any pointer would be welcome... — user603, May 04 '13 at 18:57
Do you absolutely need to deal with $E[R^2]$ ? Perhaps there is a simple exact formula for $E[R^2/(1-R^2)]$. — Stéphane Laurent, May 05 '13 at 08:20
@StéphaneLaurent: it is not clear to me that E(R**2/(1-R**2)) would be simpler to derive than E(R**2). Also, provided that $n$ is large, the approximation you suggested seems to work really well. — user603, May 05 '13 at 20:36
With the notations of my answer, $R^2/(1-R^2) = k F$ for some scalar $k$ and the first moment of the noncentral $F$-distribution is simple. — Stéphane Laurent, May 05 '13 at 21:58
Trying to follow your equations, and a bit puzzled by the dimensions. I assume vectors are considered to be in column form, so $\beta$ is of size (p,1). It follows that X' is (n,p), so X is (p,n), while you show (n,p). Moving along, in the definition of $\lambda$, X' $\beta$ must be (n,1), so E(X)' $\beta$ $1_{n}$ must be (n,1), but $1_{n}$ is (n,n), so I'm not sure how that is going to work out. I was actually going to use this formula to calculate $\lambda$, so I would appreciate it if someone could guide me to the precise formulae. — radumanolescu, Jun 23 '19 at 02:49

Stéphane Laurent · Accepted Answer · 2013-05-19T15:13:55.643

13

Any linear model can be written $\boxed{Y=\mu+\sigma G}$ where $G$ has the standard normal distribution on $\mathbb{R}^n$ and $\mu$ is assumed to belong to a linear subspace $W$ of $\mathbb{R}^n$. In your case $W=\text{Im}(X)$.

Let $[1] \subset W$ be the one-dimensional linear subspace generated by the vector $(1,1,\ldots,1)$. Taking $U=[1]$ below, the $R^2$ is highly related to the classical Fisher statistic $$ F = \frac{{\Vert P_Z Y\Vert}^2/(m-\ell)}{{\Vert P_W^\perp Y\Vert}^2/(n-m)}, $$ for the hypothesis test of $H_0\colon\{\mu \in U\}$ where $U\subset W$ is a linear subspace, and denoting by $Z=U^\perp \cap W$ the orthogonal complement of $U$ in $W$, and denoting $m=\dim(W)$ and $\ell=\dim(U)$ (then $m=p$ and $\ell=1$ in your situation).

Indeed, $$ \dfrac{{\Vert P_Z Y\Vert}^2}{{\Vert P_W^\perp Y\Vert}^2} = \frac{R^2}{1-R^2} $$ because the definition of $R^2$ is $$R^2 = \frac{{\Vert P_Z Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}=1 - \frac{{\Vert P^\perp_W Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}.$$

Obviously $\boxed{P_Z Y = P_Z \mu + \sigma P_Z G}$ and $\boxed{P_W^\perp Y = \sigma P_W^\perp G}$.

When $H_0\colon\{\mu \in U\}$ is true then $P_Z \mu = 0$ and therefore $$ F = \frac{{\Vert P_Z G\Vert}^2/(m-\ell)}{{\Vert P_W^\perp G\Vert}^2/(n-m)} \sim F_{m-\ell,n-m} $$ has the Fisher $F_{m-\ell,n-m}$ distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, $R^2 \sim {\cal B}(m-\ell, n-m)$.

In the general situation we have to deal with $P_Z Y = P_Z \mu + \sigma P_Z G$ when $P_Z\mu \neq 0$. In this general case one has ${\Vert P_Z Y\Vert}^2 \sim \sigma^2\chi^2_{m-\ell}(\lambda)$, the noncentral $\chi^2$ distribution with $m-\ell$ degrees of freedom and noncentrality parameter $\boxed{\lambda=\frac{{\Vert P_Z \mu\Vert}^2}{\sigma^2}}$, and then $\boxed{F \sim F_{m-\ell,n-m}(\lambda)}$ (noncentral Fisher distribution). This is the classical result used to compute power of $F$-tests.

The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally $R^2$ has the noncentral beta distribution with "shape parameters" $m-\ell$ and $n-m$ and noncentrality parameter $\lambda$. I think the moments are available in the literature but they possibly are highly complicated.

Finally let us write down $P_Z\mu$. Note that $P_Z = P_W - P_U$. One has $P_U \mu = \bar\mu 1$ when $U=[1]$, and $P_W \mu = \mu$. Hence $P_Z \mu =\mu - \bar\mu 1$ where here $\mu=X\beta$ for the unknown parameters vector $\beta$.

edited May 19 '13 at 15:13

answered May 04 '13 at 19:57

Stéphane Laurent

17,425
5
59
101

1

$P_Z x$ is the orthogoanl projection of $x$ on the linear subspace $Z$. And $P^\perp$ denotes projection on the orthogonal. – Stéphane Laurent May 04 '13 at 20:57
ok, then $P_W^{\perp}Y=Y'(I-H)Y$ where $H=X(X'X)^{-1}X'$. – user603 May 04 '13 at 21:03
Not really linear in $Y$ :) – Stéphane Laurent May 04 '13 at 21:07
denoting $P=X(X'X)^{-1}X'$ then $P_{Z}Y=Y'P'LPY$ where $L=I_n−1_n1_n′/n$ – user603 May 04 '13 at 21:11
No, this is a *linear* projection. Your proposal are quadratic. – Stéphane Laurent May 04 '13 at 21:14
I just put together your formula for the $R^2$ with those [here](http://en.wikipedia.org/wiki/Ordinary_least_squares#Simple_regression_model) --just above 'where TSS is the' – user603 May 04 '13 at 21:17
1

Beware of $Px \neq \Vert P x \Vert^2$. I'm going to edit my post to write the formulas. – Stéphane Laurent May 04 '13 at 21:20
You are right. But maybe we can use $P_ZY=Y'P'L^{1/2}$ and $P_Z\mu=\mu'P'L^{1/2}=(X'\beta)'P'L^{1/2}$. From there $\sigma^2\lambda=(X'\beta)'P'LP(X'\beta)$ i suppose. – user603 May 04 '13 at 21:23
1

Done - do you see any simplification ? – Stéphane Laurent May 04 '13 at 21:31
One question: what is $\bar{\mu}$ (last paragraph)? – user603 May 04 '13 at 21:33
1

$\bar \mu = \frac{1}{n} \sum \mu_i$ – Stéphane Laurent May 04 '13 at 21:34
I'm stupid ! The last equality of my answer if $\mu = \mu$ ! Sorry I'm tired. – Stéphane Laurent May 04 '13 at 21:36
:) It's Saturday. I will also come back to it latter. Also, there are two definitions of the noncentral Beta distribution. Do you mean [type I](http://en.wikipedia.org/wiki/Noncentral_beta_distribution) or type II? – user603 May 04 '13 at 21:39
1

Type I, obviously: type II are distributed on $(0, \infty)$. Actually $R^2/(1-R^2)$ has the type II distribution. I have done the last corrections for today. – Stéphane Laurent May 04 '13 at 21:41
1

Noncentral Fisher distributions as well as noncentral Beta distributions are defined from the noncentral Chi² distribution, similarly to the "central" case (noncentrality parameter $=0$). – Stéphane Laurent May 04 '13 at 21:45
Just a thought before leaving you: the noncentral Chi² can be written as a mixture of (central) Chi² with a Poisson mixing distribution on the degrees of freedom. This should yield the 1st moment of the noncentral Beta without difficulty. – Stéphane Laurent May 04 '13 at 21:53
Stéphane: it seems to work like a charm. I will add a numerical example that (I think) illustrates your solution as edit to my question. – user603 May 04 '13 at 21:57
1

@user603 See [Walck's excellent handbook on distributions](http://www.stat.rice.edu/~dobelman/textfiles/DistributionsHandbook.pdf) for the expectation (in terms of a hypergeometric series). – Stéphane Laurent May 05 '13 at 09:36

Conditional expectation of R-squared

EDIT1

EDIT2:

1 Answers1

Linked