minimum unlikelihood estimator?

Question

I am a scientist fitting some binomial data, and I have been using maximum likelihood.

My model gives a probability for each datum, $L = P(y|\theta)$. The model likelihood function is not a simple logistic model, but rather it estimates $P$ over a range of hyperparameters, and returns the marginal.

I was very surprised to find that I get different estimates of theta if I maximise the likelihood of outcome 1 occurring

$\hat{\theta}_1=argmax(L_y) $

or if I minimise the estimated probability of the opposite outcome occurring

$\hat{\theta}_0=argmin(1-L_y)$.

And through experimenting I realised this is because in general,

$\underset{\theta}{\mathrm{argmax}} [ \underset{y}{\sum}\mathrm{log}P(y|\theta) ] \ne \underset{\theta}{\mathrm{argmax}} [ - \underset{y}{\sum}\mathrm{log}(1-P(y|\theta)) ] $

I thought my estimate should be symmetrical for the two outcomes. I'm sure it's something simple, so I was looking for an explanation for this online. But I did not know where to start looking; googling for "minimum unlikelihood" and suchlike did not get me very far!

edit

It seems like $\theta_1$ overweights outcome 2, and $\theta_2$ overweights outcome 1, is that right?

The binomial likelihood for 0/1 data is the product over *all* the observations of $p(y|\theta)$. There's only one likelihood function for the entire sample, not separate ones for the 1's and the 0's. — Glen_b, Jan 12 '17 at 11:12
I'm voting to close this question as off-topic because the equivalence between maximising $f(x)$ and minimising $1-f(x)$ does not sound of sufficient interest. — Xi'an, Jan 12 '17 at 14:11
The likelihood function is NOT a probability function. That is because it is a function of the parameters for fixed values of the observations. You won't find an unlikelihood function because statsiticians do not use that terminology. — Michael R. Chernick, Jan 12 '17 at 14:36
I'm voting to close this question as off-topic because the OP is wrongly interpreting the likelihood function as a probability function. — Michael R. Chernick, Jan 12 '17 at 14:37
@MichaelChernick I totally disagree... this has **nothing** to do with question being off-topic. Is someone misunderstands something about statistics *this* is a place to ask the question. — Tim, Jan 12 '17 at 15:05
@Michael Evidence of confusion in a question is often taken as a solid reason for keeping it open, not closing it! — whuber, Jan 12 '17 at 15:05
Thanks for the comments, which eventually led me to see my logical error! (posted as additional answer) — Sanjay Manohar, Jan 12 '17 at 16:37

score 7 · Accepted Answer · edited Apr 13 '17 at 12:44

7

Expanding comment by Glen_b, binomial likelihood is

$$ L(\theta\mid n,k) \propto \theta^k(1-\theta)^{n-k} $$

where $k$ is number of successes in sample of size $n$. So if you instead look at number of failures $r = n-k$ and their probability $\xi = 1-\theta$, then you get exactly the same likelihood function

$$ L(\xi\mid n,r) \propto \xi^r(1-\xi)^{n-r} = (1-\theta)^{n-k}\theta^k $$

edited Apr 13 '17 at 12:44

Community

1

answered Jan 12 '17 at 12:37

Tim

108,699
20
212
390

score 7 · Answer 2 · answered Jan 12 '17 at 16:36

Thanks for the comments and answer, which eventually led me to the source of my confusion. I thought I'd share this for everyone, though it's pretty obvious now.

Let's say there are two observations $y_1,y_2$, which can be heads or tails, and I observed $HH$.

$P(HH|\theta) = P(y_1=H|\theta) \cdot P(y_2=H|\theta)$ : Probability of observing two heads

$\hat{\theta}=argmax(P(HH|\theta))$

Then I had to flip the polarity of the result, and got

$\ne argmin(1-P(HH|\theta)) $

$= argmin( (1-P(y_1==H))\cdot(1-P(y_2==H)) )$

I thought that I was minimising the probability that I would not observe $HH$.

But actually, I was minimising the probability that I would observe $TT$.

Which is obviously not the same thing, because I forgot to account for $HT$ and $TH$ possibilities!

And so, in summary, it's actually *true* that the parameter set that maximises the likelihood of $\{y_i\}$, is **not** the same as the parameters that minimise the "inverse observations" $\{\neg\, y_i\}$ (each datapoint's polarity inverted). That's simply because $\{\neg \, y_i\} \ne \neg\,\{y_i\}$. — Sanjay Manohar, Jan 14 '17 at 22:58

minimum unlikelihood estimator?

edit

2 Answers2