Count explanatory variable, proportion dependent variable

Question

I am having a little trouble coming up with a way of analyzing my data. If there is a short answer (i.e., "use logistic regression, dummy") you can just post that and I'll do some digging on my own - I just need to be pointed in the right direction...

My independent variable is a count and my dependent variable is a ratio. Here is the data:

success <- c(322,358,323,277)
total.trials <- c(540,533,507,540)
count = c(23,13,21,39)
ratio <- success/total.trials

IIRC, It's wrong to do a simple linear regression of ratio ~ count... so what method should I utilize here? Thanks for the help.

Okay, so here's some of the code I ran after following gung's advice of employing the use of the GEE:

subject <- c(1, 2, 3, 4)
success <- c(322, 358, 323, 277)
total <- c(540, 533, 507, 540)
count <- c(23, 13, 21, 39)
data <- cbind(success,total)

gee.model <- gee(data ~ count, id = subject, family = 'binomial')

summary(gee.model)

GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
Link:                      Logit 
Variance to Mean Relation: Binomial 
Correlation Structure:     Independent 

Call:
gee(formula = data ~ count, id = subject, family = "binomial")

Summary of Residuals:
     Min       1Q   Median       3Q      Max 
  276.6608 310.3817 322.1195 331.3620 357.5969 


Coefficients:
               Estimate  Naive S.E.   Naive z  Robust S.E.  Robust z
(Intercept) -0.25516680 0.031437649 -8.116599 0.0134033383 -19.03756
count       -0.01055972 0.001244121 -8.487698 0.0002616798 -40.35360

Estimated Scale Parameter:  0.1066564
Number of Iterations:  1

Working Correlation
     [,1]
[1,]    1

Does this look correct? And, if I am interpreting it correctly, there is a significant effect of count on the proportion.

Welcome to the site. For count data, you should look into poisson regression. In general, the family of models you need to use is determined by the type of dependent variable you have; A continuous DV usually points you toward linear regression. A binary one, toward logistic regression. A count DV, toward poisson or negative binomial regression. [This question](http://stats.stackexchange.com/questions/7535), and [this one](http://stats.stackexchange.com/questions/20826) be good place to start, even though they might touch on slightly more complicated cases than the one you are dealing with. — Antoine Vernet, Jan 17 '13 at 23:13
@AntoineVernet, thank you very much - I'll look into the Poisson and NegBin methods. — Pseudo_Scientist, Jan 17 '13 at 23:18
@DimitriyV.Masterov - Yes, apologies I forgot to mention that the ratio is actually a percentage, so bound from 0 to 1. Forgive my misuse of terminology. Thank you. — Pseudo_Scientist, Jan 17 '13 at 23:19
Do you mean a proportion? In that case, I believe you can use beta regression. — dimitriy, Jan 17 '13 at 23:25
Regression models make no assumptions about the distribution of explanatory variables, so to a first approximation, I wouldn't worry about the fact that your IV is a count. Your response is a binomial, however, so some form of logistic regression is appropriate. How is it that you have so many trials per count? Are these lots of independent observations at a few pre-specified levels of the EV? Are they lots of trials from the same experimental units? — gung - Reinstate Monica, Jan 17 '13 at 23:26
Yes, apologies again. So in the first example there were 540 trials, 322 of which were considered a success. Now I want to relate that proportion (322/544) to the count value of 23. — Pseudo_Scientist, Jan 17 '13 at 23:28
@gung yes, a subject ran a large number of trials (the binomial portion, DV) and we only have one data point (the count) to investigate relationships with. — Pseudo_Scientist, Jan 17 '13 at 23:30
@Pseudo_Scientist I just realized I misread you, and assumed your DV was the count variable. If your DV is a ratio, poisson or negative binomial regression is not appropriate, apologies. — Antoine Vernet, Jan 17 '13 at 23:35
Answers like "use logistic regression, dummy" (which also appears to be a reasonable answer your question, in fact) are not suitable for StackExchange, I'm afraid, so while your preparedness to look further on your own is admirable, we'd still have to give a more substantive answer. — Glen_b, Feb 12 '15 at 05:25

score 5 · Accepted Answer · edited Apr 13 '17 at 12:44

You have a binary response. That is the important part of this. The count status of your explanatory variable doesn't matter. As a result, you should be doing some form of logistic regression. The part that makes this more difficult is that your data are clustered within just four participants. That means you need to either use a GLiMeM, or the GEE. This is a subtle decision, but I discuss it at some length here: Difference between generalized linear models & generalized linear mixed models in SPSS. Depending on the options that your software affords you, you may also have to un-group your data, so that you have a (very long) matrix where the response listed in each row is a 1 or a 0.

Okay, fantastic. I'll look into this. I had a feeling logistic regression needed to be used, but I was unsure how to implement it in this structure. Thank you. — Pseudo_Scientist, Jan 17 '13 at 23:40

Count explanatory variable, proportion dependent variable

1 Answers1

Linked