18

Consider the simple linear model:

$$\pmb{y}=X'\pmb{\beta}+\epsilon$$

where $\epsilon_i\sim\mathrm{i.i.d.}\;\mathcal{N}(0,\sigma^2)$ and $X\in\mathbb{R}^{n\times p}$, $p\geq2$ and $X$ contains a column of constants.

My question is, given $\mathrm{E}(X'X)$, $\beta$ and $\sigma$, is there a formula for a non trivial upper bound on $\mathrm{E}(R^2)$*? (assuming the model was estimated by OLS).

*I assumed, writing this, that getting $E(R^2)$ itself would not be possible.

EDIT1

using the solution derived by Stéphane Laurent (see below) we can get a non trivial upper bound on $E(R^2)$. Some numerical simulations (below) show that this bound is actually pretty tight.

Stéphane Laurent derived the following: $R^2\sim\mathrm{B}(p-1,n-p,\lambda)$ where $\mathrm{B}(p-1,n-p,\lambda)$ is a non-central Beta distribution with non-centrality parameter $\lambda$ with

$$\lambda=\frac{||X'\beta-\mathrm{E}(X)'\beta1_n||^2}{\sigma^2}$$

So

$$\mathrm{E}(R^2)=\mathrm{E}\left(\frac{\chi^2_{p-1}(\lambda)}{\chi^2_{p-1}(\lambda)+\chi^2_{n-p}}\right)\geq\frac{\mathrm{E}\left(\chi^2_{p-1}(\lambda)\right)}{\mathrm{E}\left(\chi^2_{p-1}(\lambda)\right)+\mathrm{E}\left(\chi^2_{n-p}\right)}$$

where $\chi^2_{k}(\lambda)$ is a non-central $\chi^2$ with parameter $\lambda$ and $k$ degrees of freedom. So a non-trivial upper bound for $\mathrm{E}(R^2)$ is

$$\frac{\lambda+p-1}{\lambda+n-1}$$

it is very tight (much tighter than what I had expected would be possible):

for example, using:

rho<-0.75
p<-10
n<-25*p
Su<-matrix(rho,p-1,p-1)
diag(Su)<-1
su<-1
set.seed(123)
bet<-runif(p)

the mean of the $R^2$ over 1000 simulations is 0.960819. The theoretical upper bound above gives 0.9609081. The bound seems to be equally precise across many values of $R^2$. Truly astounding!

EDIT2:

after further research, it appears that the quality of the upper bound approximation to $E(R^2)$ will get better as $\lambda+p$ increases (and all else equal, $\lambda$ increases with $n$).

user603
  • 21,225
  • 3
  • 71
  • 135
  • $R^2$ has a Beta distribution with parameters depending only on $n$ and $p$. No ? – Stéphane Laurent May 04 '13 at 18:46
  • 1
    Oooppss sorry, my previous claim is true only under the hypothesis of the "null model" (intercept only). Otherwise the distribution of $R^2$ should be something like a noncentral Beta distribution, with a noncentrality parameter involving the unknown parameters. – Stéphane Laurent May 04 '13 at 18:56
  • @StéphaneLaurent: thanks. Would you know more about the relationship between the unknown parameters and the parameters of the Beta? I'm stuck, so any pointer would be welcome... – user603 May 04 '13 at 18:57
  • Do you absolutely need to deal with $E[R^2]$ ? Perhaps there is a simple exact formula for $E[R^2/(1-R^2)]$. – Stéphane Laurent May 05 '13 at 08:20
  • @StéphaneLaurent: it is not clear to me that E(R**2/(1-R**2)) would be simpler to derive than E(R**2). Also, provided that $n$ is large, the approximation you suggested seems to work really well. – user603 May 05 '13 at 20:36
  • 1
    With the notations of my answer, $R^2/(1-R^2) = k F$ for some scalar $k$ and the first moment of the noncentral $F$-distribution is simple. – Stéphane Laurent May 05 '13 at 21:58
  • Trying to follow your equations, and a bit puzzled by the dimensions. I assume vectors are considered to be in column form, so $\beta$ is of size (p,1). It follows that X' is (n,p), so X is (p,n), while you show (n,p). Moving along, in the definition of $\lambda$, X' $\beta$ must be (n,1), so E(X)' $\beta$ $1_{n}$ must be (n,1), but $1_{n}$ is (n,n), so I'm not sure how that is going to work out. I was actually going to use this formula to calculate $\lambda$, so I would appreciate it if someone could guide me to the precise formulae. – radumanolescu Jun 23 '19 at 02:49

1 Answers1

13

Any linear model can be written $\boxed{Y=\mu+\sigma G}$ where $G$ has the standard normal distribution on $\mathbb{R}^n$ and $\mu$ is assumed to belong to a linear subspace $W$ of $\mathbb{R}^n$. In your case $W=\text{Im}(X)$.

Let $[1] \subset W$ be the one-dimensional linear subspace generated by the vector $(1,1,\ldots,1)$. Taking $U=[1]$ below, the $R^2$ is highly related to the classical Fisher statistic $$ F = \frac{{\Vert P_Z Y\Vert}^2/(m-\ell)}{{\Vert P_W^\perp Y\Vert}^2/(n-m)}, $$ for the hypothesis test of $H_0\colon\{\mu \in U\}$ where $U\subset W$ is a linear subspace, and denoting by $Z=U^\perp \cap W$ the orthogonal complement of $U$ in $W$, and denoting $m=\dim(W)$ and $\ell=\dim(U)$ (then $m=p$ and $\ell=1$ in your situation).

Indeed, $$ \dfrac{{\Vert P_Z Y\Vert}^2}{{\Vert P_W^\perp Y\Vert}^2} = \frac{R^2}{1-R^2} $$ because the definition of $R^2$ is $$R^2 = \frac{{\Vert P_Z Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}=1 - \frac{{\Vert P^\perp_W Y\Vert}^2}{{\Vert P_U^\perp Y\Vert}^2}.$$

Obviously $\boxed{P_Z Y = P_Z \mu + \sigma P_Z G}$ and $\boxed{P_W^\perp Y = \sigma P_W^\perp G}$.

When $H_0\colon\{\mu \in U\}$ is true then $P_Z \mu = 0$ and therefore $$ F = \frac{{\Vert P_Z G\Vert}^2/(m-\ell)}{{\Vert P_W^\perp G\Vert}^2/(n-m)} \sim F_{m-\ell,n-m} $$ has the Fisher $F_{m-\ell,n-m}$ distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, $R^2 \sim {\cal B}(m-\ell, n-m)$.

In the general situation we have to deal with $P_Z Y = P_Z \mu + \sigma P_Z G$ when $P_Z\mu \neq 0$. In this general case one has ${\Vert P_Z Y\Vert}^2 \sim \sigma^2\chi^2_{m-\ell}(\lambda)$, the noncentral $\chi^2$ distribution with $m-\ell$ degrees of freedom and noncentrality parameter $\boxed{\lambda=\frac{{\Vert P_Z \mu\Vert}^2}{\sigma^2}}$, and then $\boxed{F \sim F_{m-\ell,n-m}(\lambda)}$ (noncentral Fisher distribution). This is the classical result used to compute power of $F$-tests.

The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally $R^2$ has the noncentral beta distribution with "shape parameters" $m-\ell$ and $n-m$ and noncentrality parameter $\lambda$. I think the moments are available in the literature but they possibly are highly complicated.

Finally let us write down $P_Z\mu$. Note that $P_Z = P_W - P_U$. One has $P_U \mu = \bar\mu 1$ when $U=[1]$, and $P_W \mu = \mu$. Hence $P_Z \mu =\mu - \bar\mu 1$ where here $\mu=X\beta$ for the unknown parameters vector $\beta$.

Stéphane Laurent
  • 17,425
  • 5
  • 59
  • 101