2

I'm looking for an approximation to the curve of a lognormal distribution, for use in non-linear regression against a dataset. As an alternative, I'm interested in an approximation to the CDF thereof.

I have several goals:

  1. Determine how closely a sample dataset drawn from a process of unknown distribution matches a lognormal distribution.
  2. Given a set of samples drawn from a process determined or known to be lognormally distributed, but with unknown characteristics, determine the probability of a observing a future sample with a given value that may lie outside the range of values observed thus far.
  3. Given a set of samples as above, and a new sample that may lie outside the range of values in the samples already seen, determine the likelihood of having observed that sample.
  4. While computing the above, efficiency and simplicity of the implementation is quite important. A fast approximation with well-understood properties such as error bounds is better than an accurate algorithm that is difficult to implement or computationally intensive.

For these purposes, I think it is best if the approximation has a closed form so that a nonlinear regression can be done with lower computational overhead. Ideally if it is an approximation to the lognormal distribution itself, then it would be nice if it has a simple integral as well so that the CDF can be approximated too.

It's possible that I'm trying to go about this the wrong way. Here's a more specific question to help figure that out: suppose I gather 1000 samples from my process. The values of the samples mostly range from 1 to 10, with occasional samples up to 20 or so. I know (based on experience) that long-term it is quite possible to see samples in the range of 10 times higher than that, but I haven't observed any from this process yet. How can I determine the probability of the 1001st sample having a value greater than 100 or another arbitrary number? If the 1001st sample's value is 180, how can I determine how likely that was, based on the first 1000 samples?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • You may want to consider the [log logistic distribution](http://en.wikipedia.org/wiki/Log-logistic_distribution) which has a truly closed form (in terms of elemental functions). This distribution, although it has slightly heavier tails, is usually considered similar to the normal distribution. – LessFaceMoreBook Jan 30 '14 at 23:52
  • 3
    It's not clear why you need to *approximate* the lognormal pdf and cdf in order to answer the first question (whether a sample distribution is consistent with the lognormal). Nor the second question, nor indeed the third question. For example, the first question might be pretty easily answered by performing normal goodness of fit test (some of which are fairly fast to compute) on the logs of the data. – Glen_b Jan 31 '14 at 00:43
  • 2
    (+1) to @Glen's comment. And, just so you are aware, that Aludaat and Alodat approximation is, unfortunately, pretty bad in the tails. For some things it might be ok, but caution is advised; [this answer](http://stats.stackexchange.com/a/7206/2970) gives some additional overview on that. Note also that the cdf of a lognormal can be written in terms of the cdf of a standard normal. At any rate, maybe a more explicit example of the problem you're trying to solve would help users here provide more concrete guidance. Welcome to the site and cheers. :-) – cardinal Jan 31 '14 at 02:18
  • Thanks for the hints. I am interested in learning more about all of the above. Are these appropriate things to expand upon in answers, or should I look at other answers? – Baron Schwartz Jan 31 '14 at 12:04
  • Argh. Can't press enter to start a new paragraph :) On the topic of a more explicit example, I'll add that to the question. – Baron Schwartz Jan 31 '14 at 12:06
  • I removed the following from the question: To give an analogy to a different distribution, I have had quite good success applying this technique to an approximation of the standard normal CDF, found in Applied Mathematical Sciences, Vol. 2, 2008, no. 9, 425 - 429 (Aludaat and Alodat). The reason was actually that's just one that I examined; I ended up using http://home.online.no/~pjacklam/notes/invnorm/ and I don't use that the same way I'm proposing here; sorry for the distraction. – Baron Schwartz Jan 31 '14 at 12:16
  • Baron, Acklam's approximation to the inverse distribution is *excellent*. The only major modern competitor in terms of accuracy is Wichura's algorithm. – cardinal Jan 31 '14 at 12:21

0 Answers0