7

In many settings, we are interested in estimating a model with a fractional dependent variable. For example, Papke & Wooldridge (1996) http://faculty.smu.edu/millimet/classes/eco6375/papers/papke%20wooldridge%201996.pdf consider 401(k) plan participation rates, where the rate is defined as $PRATE=\frac{accounts}{emplyees}$. The authors then develop a GLM method to estimate such models. Looking at the count data literature, I wonder one should not run a Poisson regression of $accounts$ on the same set of regressors, and as an offset $employees$. Does this potentially depend on the absolute number of $accounts$?

This is different from a suggested duplicate, What regression model is the most appropriate to use with count data? as my question discusses the correct place of the offset / denominator.

Felix H
  • 145
  • 9
  • ... as an offset log(employee) ;-) (if used log-link)! imho ... you've got the same results, but what (in what scale...) you wont (prefere) to interprete it? - just a matter of taste... – Ivan Kshnyasev Jun 29 '16 at 10:22
  • 1
    Possible duplicate of [What regression model is the most appropriate to use with count data?](https://stats.stackexchange.com/questions/204696/what-regression-model-is-the-most-appropriate-to-use-with-count-data) – kjetil b halvorsen Oct 11 '17 at 22:48
  • I don't think so. I am asking about count data with a very clear offest / exposure variable and when to model something as rate or count. – Felix H Oct 13 '17 at 11:29
  • You must use log(employees) as offset. Can you give more details of your application? A very detailed discussion of the How/Why of offset is in https://stats.stackexchange.com/questions/142338/goodness-of-fit-and-which-model-to-choose-linear-regression-or-poisson/142353#142353, you could also look at https://stats.stackexchange.com/questions/307369/how-to-interpret-glm-and-ols-with-offset/307383#307383 (Both are better duplicated than the one proposed above) – kjetil b halvorsen Oct 13 '17 at 11:36

1 Answers1

1

One reason not to use Poisson regression here is that, since each employee can have at most one account, the number of accounts is bounded by the number of employees. A Poisson distribution would allow nonzero probability for the number of accounts exceeding the number of employees. My understanding is that although Poisson regressions are robust to a lot of violations of assumptions, you'd at least get a loss of efficiency from using a Poisson regression compared to something more appropriate.

The question then should be: wouldn't a binomial regression be more appropriate? (Assuming the same participation rate $p$ for each employee, the number of plans $y$ should be distributed as $Binomial(n,p)$ where $n$ is the number of employees.) IIRC, the reason a binomial regression can't be employed in this case is that the number of employees is not known; only the participation rate itself is known. That rules out binomial regression---and would also rule out Poisson regression with an offset, even if it were appropriate.

The Laconic
  • 1,454
  • 2
  • 10
  • 18
  • Thank you for your answer! However, what if we knew the number of employees and each employee could only have zero or one accounts? – Felix H Oct 13 '17 at 16:03
  • That's the binomial regression case. – The Laconic Oct 13 '17 at 18:56
  • Sure, but then what should be preferable? Binomial or count with some offset? – Felix H Oct 19 '17 at 12:37
  • Binomial. An offset doesn't do anything to keep the distribution bounded above; the number of observations cannot, in principle, come from a Poisson distribution. On the other hand, if each employee can have zero or one accounts, and the probability $p$ of having an account is the same for each employee in a group of $n$ employees, the total number of accounts is literally distributed as Binomial(n,p). – The Laconic Oct 19 '17 at 23:59