1

Let $X_1,X_2,\ldots, X_{n_1}$ be IID with PDF $f(x-\theta) $, for $-\infty<x<\infty$ and $-\infty<\theta<\infty$. Denote the CDF of $X_i$ by $F(x-\theta)$. Let $Z_1,Z_2, \ldots, Z_{n_2}$ denote the censored observations. For these observations we only know that $Z_j >a $ for some $a$ that is known and that the $Z_j$s are independent of the $X_i$s.

Then the "observed" and "complete" , that is for both the observed and the censored data, likelihoods are given by:

$$L(\theta| \mathbf{x} )= [1-F(a-\theta)]^{n_2} \prod_{i=1}^{n_1} f(x_i -\theta) $$

$$ L( \theta |\mathbf{x,z})=\prod^{n_1}_{i=1}f(x_i -\theta) \prod_{i=1}^{n_2}f(z_i-\theta) $$

My question is why do the likelihoods take that form? I believe for the $Z_j$s we have a left censoring as they only take greater than $a$ values, as opposed to the $X_is$ that are not restricted in that way. We also know that they are mutually independent.

Could it be that the "observed" likelihood is obtained after using that independence? We have $n_2$ observations greater than $a$, times the joint density of the $X_i$s, right?.

A little more explanation on these two equations would go a long way.

Thank you in advance.

JohnK
  • 18,298
  • 10
  • 60
  • 103
  • Please clarify: in what sense $Z$s are independent of $X$'s? Since what I understand is that $Z = X$ if $X>a$, since you write that the $X's$ are unobserved. Are they supposed to come from two independent samples from the same population? But then in what sense $X$'s are unobserved? But if this is the case, why the likelihood of the $X$'s only, should be anything else than just the product of the densities? – Alecos Papadopoulos Jan 09 '14 at 19:41
  • @AlecosPapadopoulos They are mutually independent. This is one sample with some of its observations, the $Z_j$s censored. By "unobserved", I meant censored. Apologies for the confusion. I edited my post. – JohnK Jan 09 '14 at 19:49
  • By definition your $Z_j$ are *right* censored. When you refer to "that" form, precisely what aspect of the likelihood are you asking about? – whuber Jan 09 '14 at 19:59
  • @whuber Thank you clarifying. Well, if the likelihood for the $X_i$s is like my first equation, what is the original density of $X_i$?. Same goes for equation $2$. Do we write it like that because the $X_i$s and the $Z_j$s are independent? – JohnK Jan 09 '14 at 20:06
  • 1
    I do not understand the questions in your comment. By definition, the likelihood of $(x,z)$ is the probability of observing $(x,z)$ as a function of $\theta$. The information you give in your first paragraph more or less stipulates how to find the probability of any individual $x_i$ or $z_j$ and the independence assumption implies those probabilities are multiplied: that's all that's going on here. For an example, you might like to read over my answer at http://stats.stackexchange.com/a/49456/919. – whuber Jan 09 '14 at 20:18
  • 1
    @whuber I guess I do not understand why we have the probability that $Z$ is greater than $a$ in the likelihood of $x$. I am having trouble understanding the presence of the first term in the product, based on the information I have given. – JohnK Jan 09 '14 at 20:24
  • 1
    The first equation makes no sense. Before "becoming" a likelihood, it was a joint density, $f(\mathbf x \mid \theta) = \prod_{i=1}^{n_1} f(x_i -\theta)$. The $\mathbf z$'s must somehow appear in the argument of $L()$ in order for _anything_ related to them to be included in the RHS. – Alecos Papadopoulos Jan 09 '14 at 20:31
  • @AlecosPapadopoulos Precisely what has been bothering me. – JohnK Jan 09 '14 at 20:32
  • And the second equation looks wrong: as it is now, it is a perfectly usual (no censoring) joint density of two independent samples from the same population. Is there a book from which these come? – Alecos Papadopoulos Jan 09 '14 at 20:58
  • @AlecosPapadopoulos Sure. If you have it in your shelves, consult Hogg, Craig and Mckean "Introduction to Mathematical Statistics", 7th edition, Chapter 6, page 369. This is part of their discussion of the EM algorithm. – JohnK Jan 09 '14 at 20:59
  • 1
    @Alecos although the description of notation is a little off in the post, its meaning is clear: to understand the censoring, we contemplate the (hypothetical) *uncensored* data (giving the second likelihood, where the $z_i$ are uncensored observations governed by the distribution $\theta$). After censoring, the likelihood equals the first expression. The indexing and naming of variables may be a little problematic here; I prefer to use unique indexes for the variables, as illustrated at http://stats.stackexchange.com/a/49456, because it helps avoid confusion. – whuber Jan 09 '14 at 21:19
  • 1
    John, I just did, and you have misinterpreted the book (or you have transcribed the information here in a way that is confusing, at least to me -and I am not referring to the likelihoods). Later I will write an answer on the issue. (cc @whuber) – Alecos Papadopoulos Jan 09 '14 at 21:21
  • @AlecosPapadopoulos But this is the exact wording of their example. Take your time. – JohnK Jan 09 '14 at 21:26
  • 1
    @whuber That does it, thank you. The first term is the probability we observe a $Z_j$. It's finally clear. – JohnK Jan 09 '14 at 21:37
  • Look also at the beginning of the section on the EM algorithm, where the authors describe the general setting, and how the complete, observed, and conditional likelihoods are related. Derive the conditional likelihood for $\mathbf z$ and then divide the complete likelihood by it, to obtain the observed likelihood. – Alecos Papadopoulos Jan 09 '14 at 21:53
  • @AlecosPapadopoulos Yes, I suppose that's one way to show it, although the authors (misleadingly) present the conditional distribution of $\mathbf{z}$ as a result of these two. Anyway, I'm glad I finally figured it out because I do some of their most difficult exercises and then get stuck on some nonsense notation like that one. A complete waste of my time and your energy. Thank you. – JohnK Jan 09 '14 at 21:59

1 Answers1

5

In the case of your first equation:

$$L(\theta| \mathbf{x} )= [1-F(a-\theta)]^{n_2} \prod_{i=1}^{n_1} f(x_i -\theta) $$

it's assumed you only observe the $n_1$ values of $\mathbf{x}$, but that you also know there were $n_2$ observations censored at $a$. It might have been better for the authors to have written the left hand side as $L(\theta | \mathbf{x}, n_2, a)$ for clarity.

If you didn't know anything about the censored observations, including how many of them there were, the likelihood function would then be a function solely of the $\mathbf{x}$, by necessity. You'd then get:

$$L(\theta| \mathbf{x} )=\prod_{i=1}^{n_1} f(x_i -\theta) $$

which is what I suspect you expected to see for $L(\theta | \mathbf{x})$.

In the case of your second equation:

$$L( \theta |\mathbf{x,z})=\prod^{n_1}_{i=1}f(x_i -\theta) \prod_{i=1}^{n_2}f(z_i-\theta) $$

it's assumed that you are observing all the $\mathbf{z}$ values, as in, none of them are censored. Hence the likelihood function is the product of the likelihood functions for the $\mathbf{x}$ and the $\mathbf{z}$, and the label "complete data log likelihood".

jbowman
  • 31,550
  • 8
  • 54
  • 107