Optimal estimation of a mean from non-independent data

Question

I have the following model:

$Y_1=\beta+\varepsilon_1+\varepsilon_2$

$Y_2=\beta+\varepsilon_3+\varepsilon_4$

$Y_3=\beta+\varepsilon_1+\varepsilon_4+\varepsilon_5$

$Y_4=\beta+\varepsilon_2+\varepsilon_3+\varepsilon_5$

$\varepsilon_i\thicksim \text{iid } \mathcal{N}(0,\sigma^2), \forall i$

I would like to obtain the best (unbiased and with minimum variance) estimator of $\beta$. That is, I would like to know $\hat{\beta}=f(Y_1,Y_2,Y_3,Y_4)$. How should I obtain it?

I will really appreciate your help.

Is $\sigma^2$ a "nuisance parameter" or is it of particular interest? — Macro, Jul 18 '12 at 12:21
good catch @Didier, it is my opinion that this question is more "at home" here than on the math SE. — Macro, Jul 18 '12 at 13:57
Nikita, please tell us where you want your question to stay. We will coordinate migration/merging. — chl, Jul 18 '12 at 14:05
@Didier, maybe I don't have a proper understanding of the scope of the math SE but this seems purely like a statistical modeling/parameter estimation problem. Of course it involves math but doesn't almost anything in statistics? — Macro, Jul 18 '12 at 14:48
I am not sure myself where it belongs. Maybe keep it here since more people replied here. Thank you guys! I was not sure about cross-posting conventions. — Nikita Samoylov, Jul 19 '12 at 00:34
@Macro, $\sigma^2$ is a nuisance parameter. I actually know it in my application. — Nikita Samoylov, Jul 19 '12 at 00:36
Nikita: Where in the accepted post do you see an answer to your question, that is, if I read you correctly, a function $f$ defining the UMVE of $\beta$? If there is one, I might have entirely misunderstood the question, in which case please explain what your question is, really... — Did, Jul 19 '12 at 07:24
@Macro Thanks for your comment. To me, both options (here and on math.SE) have their merits. This means in particular that the question is fully "at home" on math.SE. — Did, Jul 19 '12 at 09:31
@Didier, I don't. I accepted that answer because for a person with my level of knowledge of statistics (and I think for most of the readers who would be interested in that question and not know the answer themselves) that answer is the most understandable and leads quickly to derivation of the function of interest. Your answer and Macro's are more to the point, they are very useful and I would gladly accept them as well, but I do not seem to be able to accept more than one. — Nikita Samoylov, Jul 22 '12 at 13:21
Sorry but the level of knowledge is at best secondary here: some posts answer the question, others do not, and you acknowledge the answer you accepted does not (and nothing prevented you to ask for explanations, but you never did). For your interest, note that several assertions of the accepted post are dubious or worse. To begin with, despite the two first paragraphs there, the fact that the UMVE is linear (affine, really) is not an assumption but **a theoretical result** (specific to the area of gaussian families) hence to consider only such estimators is **not** a restriction. .../... — Did, Jul 22 '12 at 13:53
.../... Likewise, still in this gaussian context, despite the second paragraph there, the MLE is **never** biased. And finally, no, *the solution does* not *require the use of Lagrange multipliers* (as another answer demonstrates). Note also that the accepted post did not introduce any kind of Lagrange multipliers before these appeared in another post (but, willing to teach, I am glad they did appear later on). Finally, I hope that your own computations indeed led you to the estimator stated in another post despite some contrary comments of the author of the accepted answer, and .../... — Did, Jul 22 '12 at 13:54
.../... I very much hope *readers who would be interested in that question and not know the answer themselves* do NOT turn to the accepted answer for explanations. — Did, Jul 22 '12 at 13:54

score 9 · Answer 1 · edited Apr 13 '17 at 12:44

As you've described the problem, ${\boldsymbol Y} = \{Y_1, Y_2, Y_3, Y_4\}$ will have a multivariate normal distribution with mean ${\boldsymbol \mu} = (\beta, \beta, \beta, \beta)'$ and covariance matrix

$$ \Sigma = \sigma^2 \left( \begin{array}{cccc} 2 & 0 & 1 & 1 \\ 0 & 2 & 1 & 1 \\ 1 & 1 & 3 & 1 \\ 1 & 1 & 1 & 3 \\ \end{array} \right) $$

Normally this type of covariance structure model would require some kind of software like MPLUS but I believe it may be simple enough to "trick" lme into fitting a model like this but it is simple enough to "build-your-own".

I'm not sure about getting the unbiased minimum variance estimator (although I'm sure the ordinary sample mean would be competitive), but I can describe how to get the maximum likelihood estimator (MLE), which is desirable for the reasons mentioned by Michael Chernick. The log-likelihood for a single observation of ${\boldsymbol Y}$ is

$$L(\beta, \sigma^2) = \log \left( \frac{1}{(2\pi)^2 |\Sigma|^{1/2}} \right) -\frac{1}{2} ({\boldsymbol Y}-{\boldsymbol \mu})' \Sigma^{-1} ({\boldsymbol Y}-{\boldsymbol \mu}) $$

which is only a function of $\beta$ and $\sigma^2$ since ${\boldsymbol \mu}$ only depends on $\beta$ and $\Sigma$ only depends on $\sigma^2$. We sum over the observations and optimize the resulting function as a function of $\beta, \sigma^2$ to get the MLE. I'll use the dmnorm() function from the R package mnormt to do this and give a rather crudely programmed example:

set.seed(1234) 
N <- 100 
s = matrix(0,4,4)
s[1,]=c(2,0,1,1)
s[2,]=c(0,2,1,1)
s[3,]=c(1,1,3,1)
s[4,]=c(1,1,1,3)

# generate data where true values are beta=1, sigma^2 = 3. 
y <- list()
for(i in 1:N) y[[i]] <- rmnorm(1,mean=c(1,1,1,1),varcov=3*s)

# P[1] is beta, P[2] is sigma squared
L <- function(P)
{
   # crude barrier to prevent sigma squared being negative
   if( P[2] <= 0 ) return(Inf) 

   like <- 0 
   for(i in 1:N) 
   {
      like <- like + dmnorm(y[[i]], mean=rep(P[1],4), varcov=P[2]*s, log=TRUE)
   }
   return(-like)
}
# chose arbitrary starting values of beta=1,sigma^2=1 for the optimization
optim(c(1,1),L)$par
[1] 0.9109401 3.0786393

You can get approximate confidence intervals either by bootstrapping or using the fisher information, which will require derivatives of the log-likelihood either numerically (which is returned by optim()) or analytically, which you may find this thread helpful for.

Note that in this special case finding the UMVU estimator is particularly easy. Indeed writing out the density, we see that we obtain an exponential family with complete sufficient statistic $(\mathbf Y' \Sigma^{-1} \mathbf Y, \mathbf 1'\Sigma^{-1} \mathbf Y)$. All we are left to do is find some unbiased estimator of $\beta$ and this is guaranteed to be UMVU. Noting already that $\Sigma^{-1} \mathbf 1 = (1/2, 1/2, 0, 0)$, we are done! — cardinal, Jul 18 '12 at 17:17
@cardinal Why does my intuition fail me here? Are Y3 and Y4 left out because the of their larger variances? I was thinking that something like Y1+Y2-(Y3+Y4)/2 could be better. — Michael R. Chernick, Jul 18 '12 at 17:28
Very slick, @cardinal. That's a terrific 3 sentence answer that I'd +1 but I know you never post those. — Macro, Jul 18 '12 at 17:53
By the way, I'm noticing that the MLE is pretty darn close to $\frac{1}{2}(\overline{Y}_1 + \overline{Y}_2)$ after some experimentation. — Macro, Jul 18 '12 at 17:56
@Michael: Maybe(?) it is helpful to note that $Y_1 + Y_2 - Y_3 - Y_4$ recovers the last error term up to a known constant with probability one. This should maybe be a hint that the last two aren't really telling us anything new. — cardinal, Jul 18 '12 at 18:06
Okay that sum $=-e_5$. But why should that convince me. Each Y is equal to beta + an error term. If the error terms were iid the average of the 4 would be BLUE. So why does a little dependence change that? It would seem that because of the negative terms the covariance piece could knock off some of the variance. — Michael R. Chernick, Jul 18 '12 at 19:40

Did · Answer 2 · 2012-07-18T22:42:47.247

6

A general result on gaussian families which should be in your lecture notes says that $\hat\beta$ is an unbiased affine transform of the vector $Y=(Y_k)_{1\leqslant k\leqslant 4}$. Unbiasedness for every value of $\beta$ imposes that this transform must be linear. Since the coefficient of $\beta$ in each $Y_k$ is $1$, one sees that $\hat\beta=\langle x, Y\rangle=\sum\limits_{k=1}^4x_kY_k$ for some $x=(x_k)_{1\leqslant k\leqslant 4}$ such that $\langle x, 1\rangle=\sum\limits_{k=1}^4x_k=1$.

At this point, Lagrange multiplier's method readily yields the value of $x=(x_k)_{1\leqslant k\leqslant 4}$, hence, of $\hat\beta$, but, in the present case, symmetry considerations offer a nice alternative proof.

To see this, note that the symmetry $\varepsilon_1\leftrightarrow\varepsilon_3$, $\varepsilon_2\leftrightarrow\varepsilon_4$, exchanges $Y_1$ and $Y_2$ and exchanges $Y_3$ and $Y_4$. Since the distribution of $Y$ is invariant by this operation, this yields $x_1=x_2$ and $x_3=x_4$. Hence $x_1=x_2=\frac12(1-t)$ and $x_3=x_4=\frac12t$ for some $t$.

For every $x=(x_k)_{1\leqslant k\leqslant 4}$, the variance of $\langle x, Y\rangle=\sum\limits_{k=1}^4x_kY_k$ is $$\sigma^2\cdot((x_1+x_3)^2+(x_1+x_4)^2+(x_2+x_4)^2+(x_2+x_3)^2+(x_3+x_4)^2), $$ and, when $x=(x_k)_{1\leqslant k\leqslant 4}$ is as above, the sum in the parenthesis is $\frac14+\frac14+\frac14+\frac14+t^2$, which is minimum for $t=0$. Finally, all this proves that $$ \hat\beta=\tfrac12(Y_1+Y_2). $$

edited Jul 18 '12 at 22:42

answered Jul 18 '12 at 13:14

Did

1,577
14
23

I mentioned maximum likelihood and Macro went to the trouble of showing you how to compute it. I also mentioned best linear unbiased and Didier proposes a best linear unbiased with a constraint that the coefficients sum to 1 and (presumably) are all >=0. But such a constraint on the coefficients is unnecessary. Also although not exactly stated it is possible to misconstrue Didier's claim about unbiased estimators of beta. It is not necessarily the case that all unbiased estimates of beta are linear in the Yis. The variance of the estimate of beta in the unconstrained case I put in my answer. – Michael R. Chernick Jul 18 '12 at 15:55
It can be minimized directly by calculus without Lagrange multipliers because there are no constraints on the coefficients. – Michael R. Chernick Jul 18 '12 at 15:56
3

@Michael: I don't believe Didier unintentionally dropped your proposed nonnegativity constraint. Indeed, this situation is completely analogous to a comment I made on one of your answers a day or two ago (namely, that artificially imposing a similar constraint did not yield the desired answer). Because this is a Gaussian model, the unbiased estimator of $\beta$ will be linear functions of the $Y_k$. – cardinal Jul 18 '12 at 16:40
@cardinal My comment in my revised answer that the coefficients are unconstrained is obviously wrong because it leads to a minimum of 0 taking all coefficients equal to 0. But that would not be an unbiased estimator of beta. It is trivially identically 0. i will fix my answer. – Michael R. Chernick Jul 18 '12 at 16:59
My comments above about no constraint are wrong and the comments should be removed. Sorru. – Michael R. Chernick Jul 18 '12 at 17:06
2

(+1) I like the use of notions of symmetry here and the elementary tack. To our readers, the statement that unbiasedness implies linearity in this case may appear a little opaque. Expanding briefly on that might make a nice addition. :) – cardinal Jul 18 '12 at 17:14
I think there is something wrong with Didier's result though because Y3 and Y4 contain information about beta and ought to be able to be incorporated to lower the variance. Didier's solution has a variance= sigma square. – Michael R. Chernick Jul 18 '12 at 17:14
@Michael and Didier, I have buried a comment below Macro's answer with an entirely different argument. – cardinal Jul 18 '12 at 17:18
@cardinal: This comment of yours describes the (structural) reason why the (somewhat tedious, but more elementary) computations in my post indeed yield these coefficients. Well done. – Did Jul 18 '12 at 17:40

score 0 · Accepted Answer · edited Jul 19 '12 at 12:54

0

The $Y_i$s are dependent because some share the same noise terms. However you still have a well defined likelihood function for beta. The theory of maximum likelihood still applies. Determine the likelihood and obtain the mle for beta.

A weighted average of the Yis would be unbiased and you could find the weights that minimize the bias amoung the linear estimators. The mle could be biased but I am reasonably sure it will be consistent, asymptotically normal and achieve the Cramer-Rao lower bound. This should be checked.

For the best linear unbiased estimator of beta let that be $B= a_1Y_1+a_2Y_2+a_3Y_3+ a_4Y_4$.

Then $${\rm Var}(B) =a_1^2 {\rm Var}(Y_1)+a_2^2{\rm Var}(Y_2) + a_3^2 {\rm Var}(Y_3) +a_4^2 {\rm Var}(Y_4) + 2a_1 a_2 {\rm Cov}(Y_1,Y_2)+2a_1 a_3 {\rm Cov}(Y_1, Y_3)+2a_1a_4 {\rm Cov}(Y_1,Y_4)+2a_2a_3 {\rm Cov}(Y_2,Y_3)+2a_2a_4 {\rm Cov}(Y_2,Y_4) +2a_3a_4 {\rm Cov}(Y_3, Y_4)$$.

Now by the definition of the $Y_i$s, ${\rm Cov}(Y_1,Y_2)=0$. All the other covariance terms = sigma square because the Yi pairs each contain one common epsilon.

So $${\rm Var}(B)= \sigma^2(2a_1^2 +2a_2^2+3 a_3^2+3a_4^2 + 2a_1 a_3 +2a_1 a_4 +2 a_2 a_3+2a_2 a_4 +2 a_3 a_4)$$

Now for $B$ to be unbiased we have the constraint $a_1+a_2+a_3+a_4=1$ So the solution does require the use of Lagrange multipliers.

edited Jul 19 '12 at 12:54

Macro

40,561
8
143
148

answered Jul 18 '12 at 10:46

Michael R. Chernick

39,640
28
74
143

This certainly looks like a case where the likelihood would be very well behave and have the usual nice asymptotic properties. I don't see a better approach to the problem. – Michael R. Chernick Jul 18 '12 at 11:47
1

It's good to see that you're getting the hang of equation editing on here. One suggestion that would help you here is that, when you're writing out an equation, you only need to put the `$$` (or `$`) at the beginning and the end. Using the `$` puts you into "equation environment" and you stay there until it sees another `$`. You don't need use `$` every time you write a term in an equation. You seem to have done this with your last equation- $a_1+a_2+a_3+a_4=1$, but not with the others. – Macro Jul 18 '12 at 18:04
@Macro I figured that out while I was doing it but wasn't consistent yet. – Michael R. Chernick Jul 18 '12 at 19:28
1

Cool - a subtle issue - opening and closing with `$$` will make your equation centered and on its own line (best for big equations or those you want to be a focal point, similar to a numbered equation in a paper) while opening and closing with `$` instead for smaller equations that you can put into a sentence, for example. – Macro Jul 18 '12 at 19:49
3

Don't worry @Michael, you'll get the hang of it soon! Some of us really appreaciate that you are now posting your answers with LaTeX code in it! – Néstor Jul 18 '12 at 22:47

Optimal estimation of a mean from non-independent data

3 Answers3