Long versus short format in fitting a binomial GLM

Question

This question arises after reading the question Input format for response in binomial glm in R and its answer.

Are the long and short format to input the data in a binomial GLM truly equivalent? While the implementation options 2 and 3 in the original question seem the same indeed, the option 1, (long) format seems to loose any chance to identify some potential non-independence in the responses.

In the question Input format for response in binomial glm in R, the single answer notes that "There's no statistical reason to prefer one to the other, besides conceptual clarity.".

When using the long format you treat every observation as an individual observation, a Bernoulli say, but when you consider the short format cbind(successes, trials), or the successes with trials as weights, you have the data structure that allows you to actually try to evaluate whether there's really independence across the realizations of the multiple binomial observations, something that the series of Bernoulli's of the long format does not allow you to assess. In that sense, it would seem like the two options are not really equivalent?

Long versus short format in fitting a binomial GLM

0 Answers0