3

I have used glm() to model some data I have. The code looks like the following:

for(ddm_idx in 1:90) {
    for(ppm_idx in 1:90) {
        mdfit <- glm(cuse[[4]] ~ cuse_ddm[[3 + ddm_idx]] + cuse_ddm[[3 + ddm_idx]]^2 + 
                                 cuse_ppm[[3 + ppm_idx]] + cuse_ppm[[3 + ppm_idx]]^2 + 
                                 cuse_ddm[[3 + ddm_idx]]*cuse_ppm[[3 + ppm_idx]], 
                     family=poisson(link=log))
        mdfit_dev[ddm_idx, ppm_idx] <- deviance(mdfit)
    }
}

It turns out that for each "case", I have about 90 different data points for ddm and ppm and so that's why I have the for loop run twice. I know this is correct because a post-doc in stats also ran the same in MATLAB and got the same results.

However, my next task to to use zero inflated Poisson distribution as I have a lot of zeros in my dataset. Some of these zeros are "true" zeros and some of them false.

How can I modify my code to use glm() for this distribution?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
masfenix
  • 461
  • 4
  • 12
  • Could you clarify what you mean by a false zero? – Glen_b Mar 24 '14 at 03:24
  • `cuse[[4]]` are the number of cases per week. There are 240 weeks. The number of cases are reported by someone. In some weeks there were indeed 0 cases. In other weeks, the person was too lazy to count or did not show up to work or forgot to count it for that week. This is a false zero. – masfenix Mar 24 '14 at 03:49
  • 1
    Thanks. So both missing/`NA` and 0 are both coded as 0. – Glen_b Mar 24 '14 at 03:58
  • Yes, Correct. I wish i can send you some data but I unfortunately am under a contract. I think http://stats.stackexchange.com/questions/45262/zero-inflated-count-models-in-r-what-is-the-real-advantage is what I am looking for but I don't know what **regressors** are. – masfenix Mar 24 '14 at 04:03
  • I'm likely glad you *can't*. (If it was too big to include in the question, I don't want it anyway; in many cases, it may be better to make up a small example that shows the essential features of what you're dealing with.) – Glen_b Mar 24 '14 at 04:09
  • Thanks @Glen_b, I did update that comment. Would you know anything about regressors? – masfenix Mar 24 '14 at 04:10
  • 1
    Do you know what [regression](http://en.wikipedia.org/wiki/Regression_analysis) is? As in the first sentence [here](http://en.wikipedia.org/wiki/Linear_regression#Introduction_to_linear_regression)? In Poisson regression (and glms more generally), regressors (predictors, independent variables) play the same conceptual role as in multiple linear regression. – Glen_b Mar 24 '14 at 04:14
  • 1
    This question appears to be off-topic because it is about asking for code. – gung - Reinstate Monica Mar 27 '14 at 18:20
  • 3
    @gung Yes, this post does ask for code. But it also implicitly raises a question of how one could correctly handle mis-coded data in which true zeros are confounded with missing values. That question could only be answered here, not on SO, and a good answer would likely be much more useful than a pedestrian answer that points the OP to some blackbox code (which, if employed, would likely give bad results). – whuber Mar 27 '14 at 18:42
  • @whuber, good point. – gung - Reinstate Monica Mar 27 '14 at 18:52

1 Answers1

2

zeroinfl() in the pscl package fits the zero-inflated Poisson regression model.

pyoi
  • 36
  • 1
  • Welcome to the site, @pyoi. This isn't really an answer to the OP's question, it is more of a comment. Please only use the "Your Answer" field to provide answers. I know it's frustrating, but you will be able to comment anywhere when your reputation >50. Alternatively, you could expand this to make it more of an answer. Since you are new here, you may want to take our [tour](http://stats.stackexchange.com/tour), which contains information for new users. – gung - Reinstate Monica Mar 27 '14 at 18:22