3

I have 79 observations with 30 as first quartile, 50 as median, 50.5 as mean, 68 as third quartile. Max value at 169, min value at zero.

In particular there are only 4 observations at zero.

The distribution without zeros perfectly fit the log-normal distribution, confirmed also by shapiro test on the log-transformed data.

So, different questions arise:

  1. Can I consider the entire distribution (with zeros) a log-normal with lower bound at zero?
  2. Can I consider the 4 zeros statistically irrelevant?
  3. if not, how can I fit a lognormal distribution with observations at zero? I'm using fitdistrplus in R but I can't use it with zeros.
luchonacho
  • 2,568
  • 3
  • 21
  • 38
Jeno
  • 31
  • 1
  • 3
  • 1
    Are the zeros true zeros or do they represent below limit of detection? – ReneBt Apr 20 '18 at 13:07
  • thanks for you text. I'd like to understand a way to procede for both the cases you explained – Jeno Apr 20 '18 at 13:18
  • 2
    Some have called this a "delta lognormal" distribution. It's a mixture of an atom at zero and a lognormal distribution. I used it as an example at https://stats.stackexchange.com/a/30749/919. Knowing it's a mixture might point you towards effective solutions. For instance, you could coerce software to fit Gaussian mixtures into giving you estimates (with standard errors) by taking logs of the positive values and replacing the zeros by a narrow sequence of extremely negative values. You could implement your own maximum likelihood solution, too. – whuber Apr 20 '18 at 14:02
  • 1
    Many thanks, I read the example with great interest, that could be a way to treat my entire distribution.. About the other question, could you explain me when can I consider the zeros not relevant? I mean, which kind of analysis can compare the distribution with zeros and distribution without zeros bringing to conclude that I can remove zeros? Could be helpful the fact I have only 4 zeros over 80 observations? – Jeno Apr 20 '18 at 17:51
  • Leo Goodman, U of Chicago mathematical sociologist, once proposed substituting zeros in contingency tables with a small decimal value, e.g., 0.0001. The problem with his approach is that when the natural log of such small decimal values is taken, they explode into negative values. It sounds as though your data is strictly positive valued. Given that another possible workaround is to add a constant equal to +1 to every number and then transforming the results with the natural log. Of course, at the other end one has to remember to subtract -1 from any results. – Mike Hunter Apr 20 '18 at 18:22
  • I read several discussions related to substituting zeros with small decimal value but it seems to be not a "scientific" way. Althogh both original values seem to be close to zero, their logarithms are quite different.. So I prefer to understand if (and when) I can remove zeros.. If I can't, it seems the mixture of an atom at zero and a lognomal could be a way to procede – Jeno Apr 21 '18 at 11:55
  • I'm not too sure about *scientific* since Goodman's suggestion was intended as a heuristic workaround, not a theoretical derivation. Few, if any, would suggest that Goodman was not *scientific* in his methods, models and papers. Another option is James Tobin's *tobit* model which is an early two-stage solution to *zero-heavy* information. The next related development after Tobin was Heckman's *selection bias* model. Since Tobin and Heckman there has been a profusion of models focused on unpacking zero-heavy distributions (e.g., compound Poisson, extended Poisson-Tweedie, etc.). – Mike Hunter Apr 22 '18 at 15:15

0 Answers0