If I count cells should I consider them as Poisson distributed?

Question

I calculate (with flow cytometry) pecentage of lymphocytes with a specific receptor (Lph*) as a ratio to general number of lymphocytes (Lph). Should I consider them (Lph*) as Poisson distributed?
(My data set is here.)

What is your goal on generating a probability model for your percentage? Is it possible to generate the necessary descriptives and inference without imposing assumptions about the nature of the data generating process? — AdamO, Feb 12 '13 at 17:01
@AdamO: I don't model the process. My goal is to understand whether substances activate the lymphocyte population of interest or not. — abc, Feb 19 '13 at 06:49
Then you should eschew making any assumptions about the probability model for that data unless there is a clear need to do so. — AdamO, Feb 21 '13 at 21:19
@AdamO: I would really like to do so. I was advised to use SAS `PROC GLIMMIX` procedure but `DIST` option in its `MODEL` statement requires a type of distribution. — abc, Jun 09 '13 at 07:19

score 2 · Answer 1 · answered Jun 16 '11 at 12:38

2

The short answer is probably not, since:

the Poisson distribution is discrete, your data is continuous;
the Poisson distributions has support on 0,1,2, ..., whereas (I think) your data has a range from 0 to 100.

Without seeing your data and knowing your problem, it's tricky to give you a suggestion. A good starting position would be to look at the statistical analysis section of publications that analyse data similar to your data.

answered Jun 16 '11 at 12:38

csgillespie

11,849
9
56
85

@csgillespie: thanks for your reply. in my humble opinion we can't count 1 and a half of cell, so they are discrete... but yes i can't calculate more than 100% of them :) /p – abc Jun 16 '11 at 13:23
1

@stan: Once you calculate the ratio: Lph*/Lph, your data is no longer discrete - or am I missing something? – csgillespie Jun 16 '11 at 13:59
2

@cs I think the point is that there are several analytical options: one is to view the ratio as a continuous variable; another is to use a Poisson model for Lph* with Lph as an offset. – whuber Jun 16 '11 at 17:50
@csgillespie: I found a link: _"We would now say that the statistics of counts conform to the Poisson distribution"_ in "Poisson Statistics and Precision in Counting" section & _"Some people seem to think that counting hundreds of thousands or millions of cells lets them beat the Poisson statistics"_ in the next one in [Practical Flow Cytometry](http://books.google.com/books?id=_fKfABEzCt0C&dq=Beckman+Coulter&q=Poisson+Statistics+and+Precision+in+Counting#v=snippet&q=Poisson%20Statistics%20and%20Precision%20in%20Counting&f=false) on Page 19. But you're right and he does as well... :( – abc Jun 17 '11 at 01:23
1

With the percentages if you think of them as being between 0 to 1 instead of how you've written them then you could use the [beta distribution](http://en.wikipedia.org/wiki/Beta_distribution). Otherwise you might be able to approximate it using a [gamma distribution](http://en.wikipedia.org/wiki/Gamma_distribution) which is used in biology and even [flow cytometry](http://www.google.com/#sclient=psy&hl=en&safe=off&source=hp&q=%22gamma+distribution%22+flow+cytometry&aq=f&aqi=&aql=&oq=&pbx=1&bav=on.2,or.r_gc.r_pw.&fp=895fc67dd14a86c2&biw=1440&bih=681) just realize that gamma is unbounded above. – Chris Simokat Jun 17 '11 at 05:06
@stan did you see the comment I made about this one regarding using Beta or Gamma distributions for your data? – Chris Simokat Jun 18 '11 at 01:28
1

@ChrisSimokat: Please note that the paper you linked for gamma distribution in flow cytometry is about a kinetic model (which is fundamentally different from the proportion counts the OP is talking about. – cbeleites unhappy with SX Feb 12 '13 at 17:46
1

@csgillespie: "Once you calculate the ratio: Lph*/Lph, your data is no longer discrete" in theory it is as it can only take the values {0, 1, 2, ..., Lph}/Lph. In practice, the variance due to meeting cells that have or have not expressed the receptor is not the only source of variance: cells can stick together, resulting in too low counts. "Something" may flow along that confuses the detector and is counted although it was no cell. Receptor expression is probably a dichotomized continuous fluorescence intensity, implying that false positives and false negatives can occcur. – cbeleites unhappy with SX Feb 12 '13 at 17:56
1

@Stan: wrt. to your quotation from the flow cytometry book: I think the important part is the half of the sentence you left out: the number of cells of interest is important. So the smallest absolute counts of cells with or without receptor expressed is of interest for determining whether and which approximations you can use. – cbeleites unhappy with SX Feb 12 '13 at 18:06
@cbeleites: thank you for fresh thoughts on my question. So, what distribution should I consider in SAS PROC GLIMMIX ? – abc Jun 01 '13 at 08:45
@stan: sorry, I'm no SAS user. But for a proportion I'd either go for binomial or, if the data set is on the safe side for using the normal approximation, I'd possibly use that. – cbeleites unhappy with SX Jun 01 '13 at 10:07
@cbeleites: you consider the .2(.3) ... (.7).8 data range as safe, don't you? As some my raw data are around 1 I was indicated on beta distribution. You meant binomial one, but may be it can be [hypergeometric](http://en.wikipedia.org/wiki/Hypergeometric_distribution) as we don't replace a detected cell during flow cytometry? Thank you. – abc Jun 01 '13 at 20:26
1

@stan: I guess whether you'd go for binomial or hypergeometric depends on a more philosophical view on your experiment. The point of view "we don't replace lymphocytes" leads to hypergeometric. If you view each lymphocyte as a "trial" of "what kinds of lymphocytes are produced in which proportion", that would lead to binomial. Maybe you can do the calculations for all three distributions - you can then either report that the approximations are in fact OK and no practically important differences occur, or you gain knowledge about the limits of the approximations. – cbeleites unhappy with SX Jun 02 '13 at 11:01

score 1 · Accepted Answer · answered Feb 12 '13 at 17:39

Poisson distribution makes sense from the general point of view of flow cytometry measurements: you "sit on your detector", and wait a random time for a/the next lymphocyte to come along. The same is true for lymphocytes expressing the receptor you're interested in. But you then ask for the proportion of two such Poisson distributions.

If you focus on the proportion, you can assume that a true underlying proportion $p$ (possibly depending on the treatment) of lymphocytes exists that does express the receptor. That would be more like a Bernoulli experiment (binomial distribution). You sit on your detector and look at whether the next lymphocyte coming along does express the receptor (it doesn't matter how long you have to wait for it), which happens with probability $p$.
Note that the binomial distribution is related to the beta distribution - you get beta distributions when estimating the true proportion $p$ of a binomial distribution from Bernoulli experiments.

If you look at large enough numbers of cells, you can use approximations (e.g. normal approximation of the binomial if the smaller of $np$ and $n(1-p)$ exceeds 5 or better 10.
Assuming you have 10⁵ lymphocytes per FACS run, that means that for 0.0001 $\leq p \leq$ 0.9999 you should be OK with the normal approximation. As the lowest proportion you report is 0.0015, you are on the safe side even if you add a bit more "safety margin" for the fact that you only have an observed $\hat p$, not the true proportion $p$ (unless your FACS run takes only a very small aliquote of the sample).

See Wikipedia on distributions related to the Poisson distribution and Wikipedia on distributions related to the binomial distribution for relationships and also rules of thumb about the approximations.

If I count cells should I consider them as Poisson distributed?

2 Answers2

Linked