The summaries report twice the negative log likelihood (evaluated at the parameter estimates). They look inconsistent: one reports $24.7444$ while the other reports $9.096343.$ How can that be, when the parameter estimates and standard errors are identical?
In the first model, the data are represented as a sequence of $(x,y)$ pairs where $y,$ an observation of a random variable $Y,$ is either $0$ or $1.$ Given a parameter $(\beta_0, \beta_1)$ representing the intercept and slope (respectively), the chance that $Y=1$ is the Bernoulli chance
$$\Pr(Y=1\mid x) = p(x;\beta) = \frac{1}{1 + \exp(-(\beta_0 + \beta_1 x))}$$
and (of course) the chance that $Y=0$ must be $1-p(x;\beta).$
In the example, there are five data with $x=2.$ Two of these have $y=1$ so they collectively contribute
$$\log \Pr(Y=1\mid x=2) + \log \Pr(Y=1\mid x=2) = 2\log p(2;\beta)$$
to the log likelihood associated with $\beta.$ The other three of the data with $x=2$ have $y=0$ so they collectively contribute
$$3 \log \Pr(Y=0\mid x=2) = 3\log (1 - p(2;\beta))$$
to the log likelihood. The observations with $x=2$ therefore contribute an amount
$$2\log p(2;\beta) + 3\log(1-p(2;\beta))\tag{1}$$
to the log likelihood.
The second Binomial model gathers all the data for each separate $x$ value, regardless of the order in which they appear, and summarizes them by counting the number of $y$ values that equal $1$ (the "yes" values) and the number of $y$ values that equal $0$ (the "no" values). Let's call these numbers $k$ and $l$ respectively. The Binomial probability is
$$\Pr((k,l)\mid x) = \binom{k+l}{k} p(x;\beta)^k (1-p(x;\beta))^l.$$
For instance, when $x=2$ we see $k=2$ and $l=3,$ whence
$$\log \Pr((2,3)\mid x=2) = \log\binom{5}{2} + 2\log p(2;\beta) + 3 \log(1- p(2;\beta)).\tag{2}$$
Compared to $(1)$, this includes an extra additive term of $\log\binom{5}{2}$ that was not present in $(1).$ It reflects the choice to neglect the order of the data in the dataset.
Consequently, after everything has been added up to form the log likelihoods, we find the second one will exceed the first by
$$\log\binom{2+3}{2} + \log\binom{1+4}{1} + \log\binom{3+2}{3} + \log\binom{4+1}{4} \approx 7.82405.$$
Indeed,
$$9.096343 - 24.74444 = -15.6481 = -2\times 7.82405.$$
Why doesn't this matter? Because log likelihoods are only compared to one another (by subtracting suitable multiples). They are not interpreted as log probabilities. So long as you compute likelihoods in a consistent manner, any extra additive terms will cancel in such a subtraction. For instance, the comparison suggested by the output is between the "null deviance" and the "residual deviance." You can check these differences are identical in the two formulations:
$$27.726 - 24.7444 \approx 2.982 \approx 4.2576 - 1.2762.$$
(They differ a tiny bit in the last decimal place, but only due to rounding in the output.)
The moral of the story is that the reported values of log likelihoods and deviances in software summaries are in themselves meaningless. Meaning attaches only to suitable differences: so please make sure when you make such comparisons, you are using the same algorithm for both.
Another consequence is that when you re-do a Maximum Likelihood model using different software (perhaps as a check), be prepared to see it report different log likelihoods. Any relevant differences, though, should equal the original differences, at least up to the precision with which the programs do their computing. (It is not unusual to see the reported optimal log likelihoods differ in the second or even first decimal place in difficult problems due to the use of different optimization procedures and error tolerances.)
I did some calculations in R
to confirm this interpretation. Here they are without comment: they parallel this post and so should be self-explanatory.
logistic <- function(x) 1 / (1 + exp(-x)) # Common helper function
#
# Log likelihood for binary 0/1 responses.
#
Lambda <- function(beta, x, y) {
p <- logistic(beta[1] + beta[2] * x)
sum(y * log(p) + (1-y) * log(1-p))
}
# For example:
x <- c(2,2,2,2,2,3,3,3,3,3,5,5,5,5,5,6,6,6,6,6)
y <- c(1,1,0,0,0,1,0,0,0,0,1,1,1,0,0,1,1,1,1,0)
beta <- c(-2.0608, 0.5152)
-2 * Lambda(beta, x, y) # 24.74444
#------------------------------------------------------------------------------#
#
# Log likelihood for summarized (count) responses.
#
Lambda.0 <- function(beta, x, success, failure, with.binomial=TRUE) {
p <- logistic(beta[1] + beta[2] * x)
cnst <- ifelse(isTRUE(with.binomial), sum(lchoose(success + failure, success)), 0)
cnst + sum(success * log(p) + failure * log(1-p))
}
# For example:
x.0 <- c(2,3,5,6)
yes <- c(2,1,3,4)
no <- c(3,4,2,1)
-2 * Lambda.0(beta, x.0, yes, no) # 9.096343: includes log binomial coefficients
-2 * Lambda.0(beta, x.0, yes, no, with.binomial=FALSE) # 24.74444
sum(lchoose(yes+no, yes)) * -2 # -15.6481 = 24.74444 - 9.096343