0

I read in one of the textbooks that for ungrouped binary data the dispersion parameter should always be $\phi = 1$.

Do you know why it is the case?

shani
  • 601
  • 5
  • 13
  • See https://stats.stackexchange.com/questions/386675/what-are-weights-in-a-binary-glm-and-how-to-calculate-them/386913#386913 – Gordon Smyth Feb 04 '22 at 09:21
  • @GordonSmyth In this link you have written that $var(y_i) = \mu_i(1-\mu_i)/w_i$. I suppose $1/w_i$ is the variance inflation factor. I wonder how did you conclude that "it is impossible for the variance to be anything other than $\mu_i(1−\mu_i)$". What is the logic behind this statement? – shani Feb 06 '22 at 03:37
  • It is one line of mathematics. Try computing the variance for yourself and you will see. – Gordon Smyth Feb 06 '22 at 05:23

1 Answers1

2

Suppose $Y$ is a binary random variable that takes value 1 with probability $p$ and 0 with probability $1-p$.

Then $$E(Y)=0(1-p)+1p=p$$ and $$\mbox{var}(Y)=E(Y^2)-E(Y)^2=0^2(1-p)+1^2p-p^2=p(1-p).$$

This shows that the variance of $Y$ is a function of the mean, i.e., the variance is completely determined by the mean. Hence there are no unknown parameters to estimate and there cannot be any overdispersion or underdispersion.

Gordon Smyth
  • 8,964
  • 1
  • 25
  • 43
  • So, in the case of the binomial distribution, the variance depends on $n$ and $p$, so that overdispersion is a result of impact caused by $n$ and $p$? – shani Feb 06 '22 at 08:17
  • 1
    @shani Overdispersion is caused by dependence. If $n>1$ and the trials are independent then the sum is binomial, i.e., not overdispersed. If the trials are positively dependent then the sum is over-dispersed relative to binomial. This is explained in the link I gave you in my comment above. If $n=1$ there is only one trial, so nothing to be correlated with, hence no overdispersion. – Gordon Smyth Feb 06 '22 at 10:03
  • 1
    @shani The other way for overdispersion to arise is when there are $n>1$ trials and the trials are independent but the success probability is not constant from one trial to another. That will also lead to a variable that is overdispersed relative to binomial. Again, this cannot occur for $n=1$. In summary, the binomial distribution assumes $n$ independent trials with constant success probability. Failure of either of the assumptions (independence or constancy) can lead to overdispersion. Neither failure can occur when $n=1$. – Gordon Smyth Feb 06 '22 at 10:17