Fitting a Negative Binomial Distribution to small data set

Question

I would like to perform a sample size calculation based on data from a small pilot study. In this experiment, cells are counted in each individual. There will be two different treatment groups which I will compare using glm(). I know how to conduct the sample size calculation once I have specified the distribution of my data.

The data from the pilot study for one group is:

cells <- c(11, 25, 4, 5, 1, 18, 3, 11, 13, 5, 25, 13)

As a barplot, it looks as follow:

How can I determine, what distribution this data may possibly come from? Because it is count data and because of the "shape" of this barplot, I thought about fitting a negative binomial distribution. I simply tried out different shapes for the Gamma distribution, and found it quite suitable with parameters size=3 and mu=mean(cells1).

barplot(table(rnbinom(n = 100000, size = 3, mu = mean(cells))))

However, how can I justify my choice more formally?

score 2 · Answer 1 · answered Mar 13 '20 at 13:52

There is a lot of good advice here, I will only show some illustrations using the R package vcd, visualizing categorical data, which is your friend. First:

library(vcd)
 summary(goodfit(table(cells), "nbin"))

     Goodness-of-fit test for nbinomial distribution

                     X^2 df    P(> X^2)
Likelihood Ratio 32.0899  5 5.70261e-06

so the fit is not very good, but better than

summary(goodfit(table(cells), "poi"))

     Goodness-of-fit test for poisson distribution

                     X^2 df     P(> X^2)
Likelihood Ratio 66.8993  6 1.764845e-12

Better to use visualization, for example

distplot(table(cells), "nbin")

Thanks a lot for your input! The vcd-package seems worthwhile to have a closer look into, for other purposes, too. Forgive me if this comes too naive, but would you suggest using a negative binomial in order to perform the sample size calculation in my case? It seems to me that there is little evidence for any distribution. Btw, we actually ended up to conclude that we cannot do any reasonable sample size calculation for this experiment and instead made some artificial "default" assumptions. — LuckyPal, Mar 17 '20 at 10:19

Fitting a Negative Binomial Distribution to small data set

1 Answers1