I am very confused with how weight works in glm
with family="binomial"
. In my understanding, the likelihood of the glm
with family = "binomial"
is specified as follows:
$$
f(y) =
{n\choose{ny}} p^{ny} (1-p)^{n(1-y)} = \exp \left(n \left[ y \log \frac{p}{1-p} - \left(-\log (1-p)\right) \right] + \log {n \choose ny}\right)
$$
where $y$ is the "proportion of observed success" and $n$ is the known number of trials.
In my understanding, the probability of success $p$ is parametrized with some linear coefficients $\beta$ as $p=p(\beta)$ and glm
function with family = "binomial"
search for:
$$
\textrm{arg}\max_{\beta} \sum_i \log f(y_i).
$$
Then this optimization problem can be simplified as:
$$
\textrm{arg}\max_{\beta} \sum_i \log f(y_i)=
\textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right)
\right] + \log {n_i \choose n_iy_i}\\
=
\textrm{arg}\max_{\beta} \sum_i n_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right)
\right] \\
$$
Therefore if we let $n_i^*=n_ic$ for all $i=1,...,N$ for some constant $c$, then it must also be true that:
$$
\textrm{arg}\max_{\beta} \sum_i \log f(y_i)
=
\textrm{arg}\max_{\beta} \sum_i n^*_i \left[ y_i \log \frac{p(\beta)}{1-p(\beta)} - \left(-\log (1-p(\beta))\right)
\right] \\
$$
From this, I thought that Scaling of the number of trials $n_i$ with a constant does NOT affect the maximum likelihood estimates of $\beta$ given the proportion of success $y_i$.
The help file of glm
says:
"For a binomial GLM prior weights are used to
give the number of trials when the response is
the proportion of successes"
Therefore I expected that the scaling of weight would not affect the estimated $\beta$ given the proportion of success as response. However the following two codes return different coefficient values:
Y <- c(1,0,0,0) ## proportion of observed success
w <- 1:length(Y) ## weight= the number of trials
glm(Y~1,weights=w,family=binomial)
This yields:
Call: glm(formula = Y ~ 1, family =
"binomial", weights = w)
Coefficients:
(Intercept)
-2.197
while if I multiply all weights by 1000, the estimated coefficients are different:
glm(Y~1,weights=w*1000,family=binomial)
Call: glm(formula = Y ~ 1, family = binomial,
weights = w * 1000)
Coefficients:
(Intercept)
-3.153e+15
I saw many other examples like this even with some moderate scaling in weights. What is going on here?