5

It is assumed that:

1) $y=q+u$

Where $q$ is productivity and $y$ a testscore that measures true productivity. $u$ is a normally distributed error term, independent of $q$, with zero mean and constant variance; $q$ is also assumed to be normally distributed with a mean equal to $\alpha$ and with a constant variance. The outcome of this is:

2) $E(q | y) = (1-\gamma)\alpha + \gamma y$
where $\gamma=Var(q)/(Var(q)+Var(u))$

How do you get equation 2? The equation can be expressed as a group effect and a individual effect.

[This is a model of statistical discrimination, see: Dennis J. Aigner and Glen G. Cain. Statistical theories of discrimination in labor markets. Industrial and Labor Relations Review, 30(2):175{187, January 1977. URL http:// ideas.repec.org/a/ilr/articl/v30y1977i2p175-187.html.]

Fusscreme
  • 321
  • 1
  • 2
  • 9

4 Answers4

6

All you need to know is that the regression of $q$ on $y$ is determined by standardizing both variables and their correlation coefficient will be the slope.

(In particular this result owes nothing to the assumptions that distributions are Normal; the independence of $q$ and $u$ is sufficient. Thus it will be most revealing to obtain it without recourse to any properties of Normal distributions.)


Preliminary Calculations

To standardize a variable, you subtract its expectation and divide by its standard deviation. We will therefore need to compute standard deviations, expectations, and a correlation coefficient.

Because $y=q+u$,

$$\mathbb{E}(y) = \mathbb{E}(q+u) = \mathbb{E}(q) + \mathbb{E}(u) = \alpha + 0 = \alpha,$$

taking care of computing the expectations.

Turn now to the standard deviations. Recall that it's simpler to work with their squares: the variances. For brevity, write $\sigma^2$ for the variance of $q$ and $\tau^2$ for the variance of $u$. Then

$$\text{Var}(y) = \text{Var}(q+u) = \text{Var}(q) + \text{Var}(u) + 2\text{Cov}(u,q) = \sigma^2 + \tau^2 + 0 = \sigma^2 + \tau^2.$$

Finally, the correlation is computed from the covariance:

$$\text{Cov}(y, q) = \text{Cov}(q+u, q) = \text{Cov}(q,q) + \text{Cov}(u,q) = \sigma^2.$$

(Both these calculations used the simplification $\text{Cov}(u,q)=0$ arising from the independence of $u$ and $q$.)

Therefore the standardized variables are $$\eta = (y-\alpha)/\sqrt{\sigma^2+\tau^2}$$ and $$\theta=(q-\alpha)/\sigma.$$

Moreover, the correlation is $$\rho=\sigma^2/\left(\sigma\sqrt{\sigma^2+\tau^2}\right) = \sigma / \sqrt{\sigma^2+\tau^2}.$$


Solution

We have computed everything necessary to regress $q$ against $y$:

$$\mathbb{E}(\theta\ |\ \eta) = \rho\, \eta.$$

(This is a fact about geometry, really: see the "Conclusions" section at https://stats.stackexchange.com/a/71303 for the derivation, which--although it is illustrated there for Normal distributions--still does not require Normality to derive.)

Expanding, and once again exploiting linearity of expectation,

$$\frac{\mathbb{E}(q\ |\ y)-\alpha}{\sigma} = \mathbb{E}(\theta\ |\ \eta) = \rho\, \eta = \frac{\sigma}{\sqrt{\sigma^2+\tau^2}}\left(\frac{y-\alpha}{\sqrt{\sigma^2+\tau^2}}\right) = \frac{\sigma(y-\alpha)}{\sigma^2+\tau^2}.$$

It is the task of ordinary algebra to convert this back to an expression for $\mathbb{E}(q\ |\ y)$ in terms of $y$, because (insofar as $\mathbb{E}(q\ |\ y)$ is concerned) all variables now represent numbers:

$$\mathbb{E}(q\ |\ y) = \frac{\tau^2}{\sigma^2+\tau^2} \alpha + \frac{\sigma^2}{\sigma^2+\tau^2} y.$$

That is Equation (2). Casting an eye back over the calculations should relieve any mystery about where these coefficients came from or what they mean.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • Very interesting, will have to work through the long post you linked to later. Just a quick question: Where does the linearity of the conditional expectation function come from? It really is sufficient that y is a linear function of independent random variables, regardless of how they are distributed? – CloseToC May 19 '14 at 20:35
  • @CloseToC A pretty thorough and general discussion of these relationships between correlation and regression appears at http://www.math.uah.edu/stat/sample/Covariance2.html. – whuber May 19 '14 at 21:21
3

The model implies that $y\sim\mathcal{N}(q,\sigma^2_u)$ and $q\sim\mathcal{N}(a,\sigma^2_q)$. By Bayes' rule: $$p(q\mid y)\propto p(y\mid q,\sigma^2_u)p(q)$$ Ignoring constant factors (see here for a similar development): $$\begin{align}p(q\mid y) & \propto \exp\left\{-\frac{(y-q)^2}{2\sigma^2_u}-\frac{(q-a)^2}{\sigma^2_q}\right\}\\ &=\exp\left\{-\frac{1}{2}\left(\frac{y^2-2yq+q^2}{\sigma^2_u}+\frac{q^2-2qa+a^2}{\sigma^2_q}\right)\right\}\end{align}$$ any term that does not include $q$ can be viewed as a proportionality constant: $$\begin{align}\qquad\qquad\qquad &\propto\exp\left\{-\frac{1}{2}\frac{-2\sigma^2_q yq+\sigma^2_q q^2+\sigma^2_u q^2-2\sigma^2_u qa}{\sigma^2_u\sigma^2_q}\right\}\\ &=\exp\left\{-\frac{1}{2}\frac{(\sigma^2_q+\sigma^2_u)q^2-2(\sigma^2_u a+\sigma^2_q y)q}{\sigma^2_u\sigma^2_q}\right\}\\ &=\exp\left\{-\frac{1}{2}\frac{q^2-2q\frac{\sigma^2_u a+\sigma^2_q y}{\sigma^2_q+\sigma^2_u}}{\frac{\sigma^2_q\sigma^2_u}{\sigma^2_q+\sigma^2_u}}\right\}\propto \exp\left\{-\frac{1}{2}\frac{\left(q-\frac{\sigma^2_u a+\sigma^2_q y}{\sigma^2_q+\sigma^2_u}\right)^2}{\frac{\sigma^2_q\sigma^2_u}{\sigma^2_q+\sigma^2_u}}\right\}\end{align}$$ Therefore: $$E(q\mid y)=\frac{\sigma^2_u a+\sigma^2_q y}{\sigma^2_q+\sigma^2_u} =\left(1-\frac{\sigma^2_q}{\sigma^2_q+\sigma^2_u}\right)a+\frac{\sigma^2_q}{\sigma^2_q+\sigma^2_u}y$$

Sergio
  • 5,628
  • 2
  • 11
  • 27
2

Another way, the shortest one ;-)

In general, if $X$ and $Y$ have a bivariate normal distribution, then (Anderson, Theorem 2.5.1): $$E[X\mid Y]=E[X]+\frac{\text{Cov}(X,Y)}{V[Y]}(Y-E[X]) =\left(1-\frac{\text{Cov}(X,Y)}{V[Y]}\right)E[X]+\frac{\text{Cov}(X,Y)}{V[Y]}Y$$ i.e. "expected value of X given Y is weighted average of mean X and Y" is a well-known result.

In your model $E[q]=a$, $V[y]=\sigma^2_q+\sigma^2_u$ and $\text{Cov}(y,q)=\sigma^2_q$ (see whubner's answer), so: $$E[q\mid y]=a+\frac{\sigma^2_q}{\sigma^2_q+\sigma^2_u}(y-a)= \left(1-\frac{\sigma^2_q}{\sigma^2_q+\sigma^2_u}\right)a+\frac{\sigma^2_q}{\sigma^2_q+\sigma^2_u}y$$

Sergio
  • 5,628
  • 2
  • 11
  • 27
  • Yes, it's the shortest in *length* -- but only because it relies on material that has already been posted! I think you would find it a challenge to find any answer shorter than the first line of mine, which I believe captures the essence of the question (and is more general than the theorem you quote). – whuber May 19 '14 at 21:42
  • Sorry, I can't understand. Did anyone quote Anderson? My second answer is "it is a well-known result". I've suggested my shortest-with-smile second answer just because I don't know what Fusscreme is looking for. If he is going to quote Aigner and Cain in a paper but must expound their statement, then something like "see Anderson, 2003, Theorem 2.5.1" could be the best way. – Sergio May 19 '14 at 21:52
  • Indeed this is very useful for me. Thanky you both and everyone else very much, your help is much appreciated. – Fusscreme May 22 '14 at 07:02
0

I think the following argument shows why, unfortunately it's a messy. Much more elegant derivations are certainly out there somewhere as the linear Gaussian case is the best understood statistical model in existence.

Anyway, we have that:

  1. U~N(0, sigma²)

  2. Q~N(alpha, beta^2)

  3. Y=Q+U.

  4. U and Q are independent.

Because a linear function of normal random variables is itself a normal random variable and due to independence of U and Q it follows that Y | Q ~ N(Q+0,sigma²).

We can now write down the probability density function of Q conditional on Y=y. By Bayes theorem that's:

(pdf of Q times * pdf of Y|Q)/(pdf of Y).

I won't write this out because it's very messy with all the Gaussian densities.

Q|Y will be a normal random variable, which means that it's mode is its mean. Ignoring the denominator (the normalizing constant), we're left with:

1/(2pi*sigma^2*beta^2) * exp(-Something(Q))

We find the mode of the posterior distribution by choosing Q so as to maximise the density. That's going to be the conditional expected value too, because the mode is the mean for a Gaussian. To do that we can ignore everything except Something(Q) because the rest isn't a function of Q.

If you do the algebra, Something(Q) = 1/2 *( (y-q)^2/sigma^2) + (q-alpha)^2/beta^2)

If you differentiate wrt q, set to 0 and solve for q, you'll get:

q=beta^2/(beta^2+sigma^2)*y+sigma^2/(beta^2+sigma^2)*alpha... as required!

CloseToC
  • 1,395
  • 8
  • 12