Difference between binary and count data of same data on logistic regression in R

Question

I confuse that the difference of Residuals deviance between binary and count data of the same data, by logistic regression in R. I'd like to know the way to calculate the both Residual deviance. Please give me some advice.

binary data

x<-c(2,2,2,2,2,3,3,3,3,3,5,5,5,5,5,6,6,6,6,6)
yesno<-c(1,1,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,1,1,0)
modelb<- glm(yesno~x,family=binomial)
(resultb<-summary(modelb))
#            Estimate Std. Error z value Pr(>|z|)
#(Intercept)  -2.0608     1.3486  -1.528    0.126
#x             0.5152     0.3147   1.637    0.102
#    Null deviance: 27.726  on 19  degrees of freedom
#Residual deviance: 24.744  on 18  degrees of freedom
#AIC: 28.744

deviance(modelb)
#[1] 24.74444
-2*logLik(modelb)
#'log Lik.' 24.74444 (df=2)

count data

x<-c(2,3,5,6)
yes<-c(2,1,3,4)
no<-c(3,4,2,1)
modelc<- glm(cbind(yes,no)~x,family=binomial)
(resultc<-summary(modelc))
#            Estimate Std. Error z value Pr(>|z|)
#(Intercept)  -2.0608     1.3486  -1.528    0.126
#x             0.5152     0.3147   1.637    0.102
#    Null deviance: 4.2576  on 3  degrees of freedom
#Residual deviance: 1.2762  on 2  degrees of freedom
#AIC: 13.096

deviance(modelc)
#[1] 1.276154
-2*logLik(modelc)
#'log Lik.' 9.096343 (df=2)

Thank you very much about your info! maybe, it's a related question. I have to study more... — 51sep, Jan 26 '20 at 16:13

score 3 · Accepted Answer · answered Jan 24 '20 at 04:37

The summaries report twice the negative log likelihood (evaluated at the parameter estimates). They look inconsistent: one reports $24.7444$ while the other reports $9.096343.$ How can that be, when the parameter estimates and standard errors are identical?

In the first model, the data are represented as a sequence of $(x,y)$ pairs where $y,$ an observation of a random variable $Y,$ is either $0$ or $1.$ Given a parameter $(\beta_0, \beta_1)$ representing the intercept and slope (respectively), the chance that $Y=1$ is the Bernoulli chance

$$\Pr(Y=1\mid x) = p(x;\beta) = \frac{1}{1 + \exp(-(\beta_0 + \beta_1 x))}$$

and (of course) the chance that $Y=0$ must be $1-p(x;\beta).$

In the example, there are five data with $x=2.$ Two of these have $y=1$ so they collectively contribute

$$\log \Pr(Y=1\mid x=2) + \log \Pr(Y=1\mid x=2) = 2\log p(2;\beta)$$

to the log likelihood associated with $\beta.$ The other three of the data with $x=2$ have $y=0$ so they collectively contribute

$$3 \log \Pr(Y=0\mid x=2) = 3\log (1 - p(2;\beta))$$

to the log likelihood. The observations with $x=2$ therefore contribute an amount

$$2\log p(2;\beta) + 3\log(1-p(2;\beta))\tag{1}$$

to the log likelihood.

The second Binomial model gathers all the data for each separate $x$ value, regardless of the order in which they appear, and summarizes them by counting the number of $y$ values that equal $1$ (the "yes" values) and the number of $y$ values that equal $0$ (the "no" values). Let's call these numbers $k$ and $l$ respectively. The Binomial probability is

$$\Pr((k,l)\mid x) = \binom{k+l}{k} p(x;\beta)^k (1-p(x;\beta))^l.$$

For instance, when $x=2$ we see $k=2$ and $l=3,$ whence

$$\log \Pr((2,3)\mid x=2) = \log\binom{5}{2} + 2\log p(2;\beta) + 3 \log(1- p(2;\beta)).\tag{2}$$

Compared to $(1)$, this includes an extra additive term of $\log\binom{5}{2}$ that was not present in $(1).$ It reflects the choice to neglect the order of the data in the dataset.

Consequently, after everything has been added up to form the log likelihoods, we find the second one will exceed the first by

$$\log\binom{2+3}{2} + \log\binom{1+4}{1} + \log\binom{3+2}{3} + \log\binom{4+1}{4} \approx 7.82405.$$

Indeed,

$$9.096343 - 24.74444 = -15.6481 = -2\times 7.82405.$$

Why doesn't this matter? Because log likelihoods are only compared to one another (by subtracting suitable multiples). They are not interpreted as log probabilities. So long as you compute likelihoods in a consistent manner, any extra additive terms will cancel in such a subtraction. For instance, the comparison suggested by the output is between the "null deviance" and the "residual deviance." You can check these differences are identical in the two formulations:

$$27.726 - 24.7444 \approx 2.982 \approx 4.2576 - 1.2762.$$

(They differ a tiny bit in the last decimal place, but only due to rounding in the output.)

The moral of the story is that the reported values of log likelihoods and deviances in software summaries are in themselves meaningless. Meaning attaches only to suitable differences: so please make sure when you make such comparisons, you are using the same algorithm for both.

Another consequence is that when you re-do a Maximum Likelihood model using different software (perhaps as a check), be prepared to see it report different log likelihoods. Any relevant differences, though, should equal the original differences, at least up to the precision with which the programs do their computing. (It is not unusual to see the reported optimal log likelihoods differ in the second or even first decimal place in difficult problems due to the use of different optimization procedures and error tolerances.)

I did some calculations in R to confirm this interpretation. Here they are without comment: they parallel this post and so should be self-explanatory.

logistic <- function(x) 1 / (1 + exp(-x)) # Common helper function
#
# Log likelihood for binary 0/1 responses.
#
Lambda <- function(beta, x, y) {          
  p <- logistic(beta[1] + beta[2] * x)
  sum(y * log(p) + (1-y) * log(1-p))
}
# For example:
x <- c(2,2,2,2,2,3,3,3,3,3,5,5,5,5,5,6,6,6,6,6)
y <- c(1,1,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,1,1,0)
beta <- c(-2.0608, 0.5152)
-2 * Lambda(beta, x, y) # 24.74444
#------------------------------------------------------------------------------#
#
# Log likelihood for summarized (count) responses.
#
Lambda.0 <- function(beta, x, success, failure, with.binomial=TRUE) {
  p <- logistic(beta[1] + beta[2] * x)
  cnst <- ifelse(isTRUE(with.binomial), sum(lchoose(success + failure, success)), 0)
  cnst + sum(success * log(p) + failure * log(1-p))
}
# For example:
x.0 <- c(2,3,5,6)
yes <- c(2,1,3,4)
no <- c(3,4,2,1)
-2 * Lambda.0(beta, x.0, yes, no) # 9.096343: includes log binomial coefficients
-2 * Lambda.0(beta, x.0, yes, no, with.binomial=FALSE) # 24.74444

sum(lchoose(yes+no, yes)) * -2    # -15.6481 = 24.74444 - 9.096343

Thank you very much about your answer! I confuse a little bit now about in the case of “yes/(yes+no)… yp — 51sep, Jan 26 '20 at 07:58

score 0 · Answer 2 · answered Jan 27 '20 at 18:25

I tried the case of proportion(=yes/yes+no), using above best answer. Yes, I got it. But, I couldn’t understand the case without “weight=n”. A little bit more for complete understanding.

#-----with “weight=n”
modelcp<- glm(yp~x,family=binomial,weight=n)
(result<-summary(modelcp))
#            Estimate Std. Error z value Pr(>|z|)
#(Intercept)  -2.0608     1.3486  -1.528    0.126
#x             0.5152     0.3147   1.637    0.102
#    Null deviance: 4.2576  on 3  degrees of freedom
#Residual deviance: 1.2762  on 2  degrees of freedom

beta <- c(-2.0608, 0.5152)
logistic <- function(x) 1 / (1 + exp(-x)) # Common helper function
Lambda.0 <- function(beta, x, success, failure,y, with.binomial=TRUE) {
  p <- logistic(beta[1] + beta[2] * x)
  cnst <- ifelse(isTRUE(with.binomial), sum((lchoose((success + failure), success))), 0)
  cnst + sum(n*(y * log(p) + (1-y) * log(1-p)))
}
-2 * Lambda.0(beta, x, yes, no, yp) # 9.096343: includes log binomial coefficients
-2 * Lambda.0(beta, x, yes, no, yp,with.binomial=FALSE) # 24.74444
sum(lchoose(n, yp*n)) * -2 # -15.64809 = 24.74444 - 9.096343


#-----without “weight=n”
modelcpout<- glm(yp~x,family=binomial)
(result<-summary(modelcpout))
#            Estimate Std. Error z value Pr(>|z|)
#(Intercept)  -2.0608     3.0155  -0.683    0.494
#x             0.5152     0.7038   0.732    0.464
#    Null deviance: 0.85152  on 3  degrees of freedom
#Residual deviance: 0.25523  on 2  degrees of freedom

deviance(modelcpout)
#[1] 0.2552307
-2*logLik(modelcpout)
#'log Lik.' 3.094208 (df=2)

I uploaded my answer of this question, but I don't have confidence about it yet. If someone makes it, please give me some advice. https://stats.stackexchange.com/questions/446966/difference-between-glm-and-optim-for-likelihood-value-on-logistic-regression-in — 51sep, Feb 01 '20 at 16:56

Difference between binary and count data of same data on logistic regression in R

2 Answers2

Linked