Binomial GLM with unknown totals?

Question

Essentially, I have some covariate data X, and a dependent variable Y consisting of proportions of a sample that shown a certain response (i.e. between 0 and 1). I suspect I want to proceed via a GLM approach, but the thing is, I don't know the sizes of each of those samples!

My thoughts are to proceed by a quasibinomial methodology, estimating the dispersion parameter. Assuming the sizes of each sample are not too different, I can thus keep the logit link between the linear predictor and the proportion, but disregard the contribution of n in the usual binomial variance of np(1-p). Then I can do hypothesis testing the usual way?

Does this make any sense?

Some R code:

#simulate some data

X = rnorm(500)
Z = rnorm(500)
p = exp(X*0.1 + 2)/(1+exp(X*0.1 + 2))
n = 50
Y = NULL
for (i in 1:length(X)){
    Y = c(Y,sum(runif(n) < p[i])/n)
}
Y2 = cbind(Y*n, n-Y*n)

#glm, binomial model is 'true' ?
summary(glm(Y~X+Z, family = "quasibinomial"))
summary(glm(Y2~X+Z, family = "binomial"))
anova(glm(Y~X+Z, family = "quasibinomial"), test = "Chisq")
anova(glm(Y2~X+Z, family = "binomial"), test= "Chisq")

Seems to work, but am I missing something? Surely someone's done something like this before?

Maybe this helps? http://stats.stackexchange.com/questions/24187/analyze-proportions — conjugateprior, Mar 21 '13 at 15:30
Possible duplicate of [Estimating parameters for a binomial](https://stats.stackexchange.com/questions/123367/estimating-parameters-for-a-binomial) — kjetil b halvorsen, Aug 01 '17 at 16:14

Binomial GLM with unknown totals?

0 Answers0