12

I am trying to model the mean intensities of parasites affecting a host in R using a negative binomial model. I keep getting 50 or more warnings that say:

In dpois(y, mu, log = TRUE) : non-integer x = 251.529000

How can I deal with this? My code looks like this:

mst.nb = glm.nb(Larvae+Nymphs+Adults~B.type+Month+Season, data=MI.df)
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Natasha
  • 123
  • 1
  • 1
  • 4
  • 2
    Please add a [reproducible example](http://stackoverflow.com/q/5963269/1217536) for people to work with. – gung - Reinstate Monica Aug 05 '15 at 23:37
  • 5
    A negative binomial GLiM is a kind of count model. The response is supposed to be counts. A [count](https://en.wikipedia.org/wiki/Count_data), by definition, cannot be a fractional value. Do you have such values? – gung - Reinstate Monica Aug 05 '15 at 23:39
  • 1
    Can you clarify what you mean by "intensities"? Are you dividing a count of a parasite by, say, an amount of surface area for a host? – gung - Reinstate Monica Aug 06 '15 at 00:09
  • See [How is it possible that Poisson GLM accepts non-integer numbers?](http://stats.stackexchange.com/q/70054/17230). You could perhaps come up with a quasi-negative-binomial regression approach, but in practice quasi-Poisson regression is considered an alternative to NB regression. Or use a model appropriate for a continuous response. – Scortchi - Reinstate Monica Aug 06 '15 at 08:30
  • 1
    I have count data, however I had to calculate the intensities to account for different sampling efforts. I do understand that I need count data there, however I was just wondering if there was another way to work with non-integer numbers using the same model. For intensities I divided count of parasites with number of infected host. – Natasha Aug 07 '15 at 02:16
  • Thanks for the link, I can use a quasi-Poisson without any errors. Maybe I'll stick to that. :) – Natasha Aug 07 '15 at 02:23
  • 3
    @Natasha, don't do it. It is overwhelmingly likely that the right way to handle this problem is according to Gung's answer, with an offset. If you want to be sure, edit your question to explain a little bit more about where the differential sampling intensities come from. Are these different numbers of hosts? Different lengths of time sampled, or number of collectors? – Ben Bolker Aug 07 '15 at 23:04
  • 1
    @Natasha: Measuring intensity that way doesn't distinguish between counting 2 parasites across 10 hosts & 20 parasites across 100 hosts: Ben & gung are right. – Scortchi - Reinstate Monica Aug 10 '15 at 09:53
  • Test it to make sure it is really including the decimal part and if it isn't, write your own. – rwinkel2000 Aug 06 '15 at 03:07
  • This doesn't make a lot of sense. A negative binomial GLiM is a count model. There shouldn't be fractional values. If there are fractional values, we need to figure out what's going on, not write a new function that will somehow shoehorn the fractional values into the model fit. – gung - Reinstate Monica Aug 06 '15 at 03:16
  • Actually the negative binomial really is defined for non-integer parameters. You are right to point out that only integers make sense for some problems because you can't have "half a failure" etc. See "extension to real-valued r" in the Wikipedia article. This became an issue for me when I was trying to use the method of moments and also when trying to use maximum likelihood estimation. – rwinkel2000 Aug 06 '15 at 03:32
  • Non-integer parameters are fine, but it's still a distribution for discrete random variables. – Scortchi - Reinstate Monica Aug 06 '15 at 08:55
  • That is true. So Gung was right to begin with. – rwinkel2000 Aug 06 '15 at 10:20
  • there *are* cases where I can imagine non-integer response values making sense -- e.g. in cases of uncertainty ("I wasn't sure whether I counted 12 or 13 parasites in that host, so I'll score it as 12.5") -- but the default should be "if you're not *really, really sure* that you know what you're doing, pay attention to the warning. Don't try to work around it -- figure out (as @Gung said initially) what's going on and modify your model accordingly." – Ben Bolker Aug 07 '15 at 23:06

3 Answers3

10

The negative binomial is a distribution for count data, so you really want your response variable to be counts (that is, non-negative whole numbers). That said, it is appropriate to account for "different sampling efforts" (I don't know exactly what you are referring to, but I get the gist of it). However, you should not try to do that by dividing your counts by another number. Instead, you need to use that other number as an offset. There is a nice discussion on CV of what an offset is here: When to use an offset in a Poisson regression? My guess is that your model should be something like:

    mst.nb = glm.nb(Larvae+Nymphs+Adults~B.type+Month+Season + 
                    offset(log(num.hosts)), 
                    data=MI.df)
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
4

It's a warning, not a fatal error. glm.nb() is expecting counts as your outcome variable, which are integers. Your data are not integers: 251.529.

R is saying "Hmmm... you might want to check this out and make sure it's OK, because it might not look right to be." If my memory is correct, SPSS doesn't give such a warning.

If you're sure that you're using the right model, even though you don't have integers, ignore it and keep going.

Jeremy Miles
  • 13,917
  • 6
  • 30
  • 64
  • 1
    I know it is a warning, I was just wondering if there is a way around it. I do have integers though so I was trying to see if there is a way of working with integers but using the same model with a different code. – Natasha Aug 07 '15 at 02:19
  • 1
    How to suppress warnings explained here: https://stackoverflow.com/questions/16194212/how-to-suppress-warnings-globally-in-an-r-script – kjetil b halvorsen Jul 13 '19 at 11:51
-2

I'm an ecological parasitologist. The way you should handle this is by cbind-ing the hosts that were parasitised and the ones that were not, and then using a binomial distribution.

Let's say you want to look at parasitised larvae: you would have n. of larvae that were healthy, and n. that were parasitised.

For example, given Lh and Lp:

parasitizedL=cbind(Lp, Lh) 
hist(parasitized)

I'm guessing you can just use a regular binomial distribution with glm(), and might not need neg.binomial model.

PLarvae1=glm(parasitizedL~B.type+Month+Season, family=binomial,data=MI.df)

Then do stepwise model reduction to see which of your factors significantly effect parasitism: see this link.

However it looks like you need to have random effects to account for repetitive sampling, so likely your random effect will be (1|Season/Month), but it's hard to tell without knowing your data.

Marco Plebani
  • 779
  • 4
  • 17