7

I am deriving logistic regression's likelihood. I have seen two different versions:

$$\begin{equation} f(y|\beta)={\displaystyle \prod_{i=1}^{N} \frac{n_i} {y_i!(n_i-y_i)!}} \pi_{i}^{y_i}(1-\pi_i)^{n_i - y_i} \tag 1 \end{equation}$$

Or this

$$\begin{equation} L(\beta_0,\beta_1)= \displaystyle \prod_{i=1}^{N}p(x_i)^{y_i}(1-p(x_i))^{1-y_i} \tag 2 \end{equation}$$

Why is there $\frac{n_i} {y_i!(n_i-y_i)!}$ in equation 1?

Sources:

  1. First: https://czep.net/stat/mlelr.pdf (page 3 equ. 2)
  2. Second: http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf (page 5 equ. 12.6)

Note: This question is not a duplicate of What does "likelihood is only defined up to a multiplicative constant of proportionality" mean in practice? One can trace the answer back to binomial distribution, after seeing how it is done. But nobody would have known the question in that post is the answer to this question.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
user13985
  • 836
  • 4
  • 12
  • 20
  • 6
    That factor should be there, but if you are looking for the $\beta$ that maximises this function then, as the factor does not depend on $\beta$ it will not have an influence on the $\beta$ where you have the maximum. By the way, you lost the $\Pi$ in the second formula. –  Sep 08 '17 at 16:07
  • Even after seeing the note (and, digging deeper, seeing the close and reopen), I too would have said "likelihood functions are defined up to proportionality" was the answer to this question. Here, it does not matter whether you know the order of the observations or not, as they lead to proportional likelihood functions – Henry Sep 09 '17 at 01:24

1 Answers1

11

The second is a special case of the first. Your first reference discusses the case where each $y_i$ is distributed as a Binomial distribution with sample size $n_i$, while the second reference assumes each $y_i$ is a Bernoulli random variable. That is the difference: when each $n_i = 1$, $\frac{n_i} {y_i!(n_i-y_i)!} = 1$.

Some quotes supporting this: from 2.1.2 in the first reference:

Since the probability of success for any one of the $n_i$ trials is $\pi_i$...

And from the first section in the second reference 12.1:

Let's pick one of the classes and call it "$1$" and the other "$0$"...

Taylor
  • 18,278
  • 2
  • 31
  • 66