10

I am stuck on how to solve this problem.

So, we have two sequences of random variables, $X_i$ and $Y_i$ for $i=1,...,n$. Now, $X$ and $Y$ are independent exponential distributions with parameters $\lambda$ and $\mu$. However, instead of observing $X$ and $Y$, we observe instead $Z$ and $W$.

$Z=\min(X_i,Y_i)$ and $W=1$ if $Z_i=X_i$ and 0 if $Z_i=Y_i$. I have to find closed-forms for the maximum likelihood estimators of $\lambda$ and $\mu$ on the basis of $Z$ and $W$. Further, we need to show that these are global maxima.

Now, I know that the minimum of two independent exponentials is itself exponential, with the rate equal to the sum of rates, so we know that $Z$ is exponential with parameter $\lambda+\mu$. Thus our maximum likelihood estimator is: $\hat{\lambda}+\hat{\mu}=\bar{Z}$.

But I'm stuck with where to go from here. I know that $W$ is a Bernoulli distribution with parameter $p=P(Z_i=X_i)$, but I don't know how to go about converting this into a statement about one of the parameters. For example, what would the MLE $\bar{W}$ be estimating in terms of $\lambda$ and/or $\mu$? I understand that if $Z_i=X_i$, then $\mu=0$, but I'm having a hard time figuring out how to come up with any algebraic statement, here.

UPDATE 1: So I have been told in the comments to derive the likelihood for the joint distribution of $Z$ and $W$.

So $f(Z,W)=f(Z|W=1)\cdot p+f(Z|W=0)\cdot (1-p)$ where $p=P(Z_i=X_i)$. Correct? I don't know how else to derive a joint distribution in this case, since $Z$ and $W$ are not independent.

So this gives us, $f(Z_i,W_i)=p\lambda e^{-\lambda z_i}+(1-p)\mu e^{-\mu z_i}$, by the definition of $W$ above. But now what? This doesn't get me anywhere. If I go through the steps of calculating the likelihood, I get: (using $m$ and $n$ as the sample sizes for each part of the mixture...)

$L(\lambda,\mu)=p^m\lambda^m e^{-\lambda \sum{z_i}}+(1-p)^n\mu^n e^{-\mu \sum{z_i}}$

$\log L=m\log p+m\log\lambda-\lambda \sum{z_i}+n\log(1-p)+n\log\mu-\mu \sum{z_i}$

If I take the partial derivatives, this tells me that my MLE estimates for $\lambda$ and $\mu$ are just the average of the $Z$'s conditional on $W$. That is,

$\hat{\lambda}=\frac{\sum{Z_i}}{m}$

$\hat{\mu}=\frac{\sum{Z_i}}{n}$

and

$\hat{p}=\frac{m}{n+m}$

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Ryan Simmons
  • 1,563
  • 1
  • 13
  • 25
  • 1
    Having just answered a similar MLE question today, may I direct you towards [that solution](http://stats.stackexchange.com/a/124408/919) for some ideas? The relationship between the questions is that your data also break up naturally into two disjoint groups: those where $W=0$ and those where $W=1$. It all comes down to writing down the likelihood for an observation of the form $(Z,W)=(z,0)$; the symmetry between $X$ and $Y$, $\mu$ and $\lambda$, immediately produces the likelihood for data of the form $(z,1)$ and then you're off and running. – whuber Nov 17 '14 at 20:56
  • Do not rush to writing the maximum likelihood! First, express the joint distribution of $(Z,W)$, then deduce the likelihood associated with the sample of $(Z_i,W)=_i)$, which happens to be closed-form thanks to the exponential assumption. Then and only then you can try to maximise the function and hence derive the maximum likelihood. – Xi'an Nov 17 '14 at 20:57
  • @whuber: (+1) it is rather straightforward indeed and involves the separation between the $(z_i,1)$'s and the $(z_i,0)$ but _both_ groups involve _both_ $\mu$ and $\lambda$, since they bring information on _both_ $X_i$ and $Y_i$, since $W_i=\mathbb{I}(X_i – Xi'an Nov 17 '14 at 21:00
  • 2
    @Xi'an That's right--and the parallels with the Normal-theory example I link to continue to hold, because there both groups provide information about the common parameter $\sigma$ (the scale), whose estimate will thereby involve "pooling" data from the groups. Here it will be seen that $\bar W$ tells us how the estimate of $\lambda+\mu$ (the rate, or inverse scale, for $Z$) should be apportioned into separate estimates of $\lambda$ and $\mu$. – whuber Nov 17 '14 at 21:05
  • I've read through the other thread, whuber, but I honestly don't understand how to apply that to this example. Z and W aren't independent, so how do I derive the joint distribution? – Ryan Simmons Nov 17 '14 at 22:04
  • OP updated. Thanks for the advice, guys. Is the way it is done now fundamentally correct? – Ryan Simmons Nov 17 '14 at 22:40
  • Because $(X_i,Y_i)$ is independent of all other $(X_j,Y_j)$ for $j\ne i$, the likelihood must be a product of $n$ terms. It cannot possibly be the sum you have written down. An error that almost cancels this one is to assume that the logarithm function is linear(!) when you go from the expression for $L$ to that of $\log(L)$ (causing you to arrive at a reasonable-looking result despite two mistakes--but one wouldn't call that "fundamentally correct"). The fact that $Z_i$ and $W_i$ do not appear to be independent is of little consequence: just write down their *joint distribution.* – whuber Nov 17 '14 at 23:02
  • Is the joint distribution that I wrote down not right? How can I find the joint distribution other than the method I used? I see the mistakes I made with the likelihood, but what about the joint distribution? – Ryan Simmons Nov 17 '14 at 23:06
  • Can you give me a hint about what I am missing here? – Ryan Simmons Nov 17 '14 at 23:15

1 Answers1

1

I don't have enough points to comment, so I will write here. I think the problem you post can be viewed from a survival analysis perspective, if you consider the following:

$X_i$: True survival time,

$Y_i$: Censoring time,

Both have an exponential distribution with $X$ and $Y$ independent. Then $Z_i$ is the observed survival time and $W_i$ the censoring indicator.

If you are familiar with survival analysis, I believe you can start from this point.

Notes: A good source: Analysis of Survival Data by D.R.Cox and D.Oakes

Below is an example: Assuming the p.d.f of the survival time distribution is $f(t)=\rho e^{-\rho t}$. Then the survival function is: $S(t)=e^{-\rho t}$. And the log-likelihood is:

$\mathcal{l}= \sum_u \log f(z_i) + \sum_c \log S(z_i)$

with summation over uncensored people ($u$) and censored people ($c$) respectively.

Due to the fact that $f(t)=h(t)S(t)$ where h(t) is the hazard function, this can be written:

$\mathcal{l}= \sum_u \log h(z_i) + \sum \log S(z_i)$

$\mathcal{l}= \sum_u \log \rho - \rho \sum z_i$

And the maximum likelihood estimator $\hat{\rho}$ of $\rho$ is:

$\hat{\rho}=d/\sum z_i$ where $d$ is the total number of cases of $W_i=1$

jujae
  • 143
  • 9