can't make sense of scipy.lognorm

Question

I have been trying to work with random variables in Python and more denerally. My model works with large numbers and lohnormal distributions, but it is very hard to get a clear answer anywhere on the intuition of the parameters as various software packages see it, which stems from the fact that the mathematical explanations are equally unclear about what the parameters mean, or are even called.

My code Python Scipy code looks like this:

rv = lognorm(s=1., scale=100000., loc=0)
x = numpy.arange(1, 1000000, 1000)
plt.plot(x, rv.pdf(x))
plt.show()

When s=1 you get a graph that looks like a textbook lognormal dostribution, with e believable shape that would have the right means, mode, etc.

What I find is that as I INCREASE s, the plot gets tighter to the left, and does not flatten out as suggested by any text. By the time s=3 or 4, the graph looks like a spike at zero, and zero everywhere else.

s is the standard deviation of the underlying random variable X which is a normal with mean equal to ln(100000) and standard deviation equal to whatit should be to achieve the desired result. In my instance, a 10% random change (100,000 units) is what I want to model.

After hours of thinking, tinkering, and experimenting, I finally cut through the clutter and figured out the answer, which - having re-written the question - I will now give.

The tricky bit here is pulling back from the real world into the fictitious normally distributed RV, $X$, which is generating the lognormal $Y=e^x$. That underlying RV is normally distributed with a mean of $ln(1,000,000)$ which is 12.816. The trick is to go mentally back 'into the real world' and ask yourself "what would be the number if a one stnadrad deviation event occurred?" In my case that would be a 10% range, or 1,100,000 (or 900,000).

But the natural log of 1,100,000 is 12.919. This is only about.10 difference, so the underlying RV has mean 12.816 and stdev .10. Since stdev is what goes into the lognorm formula (often called 'shape'), it explains (1) how to get the thing to work and (2) why the 'shape' factor is usually so small (<2): when dealing with variations in the real world of reasonable sizes (plus or minus 50% standard deviations or less), the underlying RV $X$ has a very, very tight variance.

Hope this is helpful to others. I sure spent a lot of time and never found a good resource or answer.

Read the "Notes" section of the documentation. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html — Sycorax, May 03 '18 at 03:53
I did, and it didn't help at all. Perhaps my problem is I am dealing with big numbers, though that shouldn't change a ting. But to put it in perspective, I have molecules that may be 1 million daltons in size. I have reason to believe the lognormal is right. In "real space" the SD is +/-100,000 daltons or 1%. But trying to plot anthing like that gives flat lines, spikes, etc. Were we in base 10, the underlying would be distribted normally with mu = 6 and sigma - =5. But anything thnat yties to do that wityh Scipy gives odd results. — eSurfsnake, May 03 '18 at 04:08
Actually, as noticed by @Sycorax, the documentation says it all. I cannot think of better answer then copying and pasting it. It gives you the formulas and explanation. What exactly is unclear for you? Did you read the link carefully? — Tim, May 03 '18 at 05:57
See https://stats.stackexchange.com/search?page=2&tab=Relevance&q=lognormal%20standard%20deviation for many more relevant posts on our site. — whuber, May 03 '18 at 18:38
Thanks. I can't think of any other instance where a topic or idea has so many different ways of being explained, and so many different uses of notation. — eSurfsnake, May 04 '18 at 21:43

score 2 · Answer 1 · answered May 03 '18 at 06:50

2

As you increase $\sigma$ you get both a heavier tail and a greater spread, but the way that looks in the plot may not correspond to your intuition.

(I doubt there's anything wrong with the function you're using. What you describe sounds quite consistent with what you should be seeing.)

For a fixed $\mu$ (I suggest considering $\mu=0$), the height at the mode of a lognormal is minimized at $\sigma=1$. The peak is higher both for larger and smaller $\sigma$.

You may find it instructive to plot the log-density against log(x); this makes comparisons of relative height and spread/heavy tailedness easier.

When s=1 you get a graph that looks like a textbook lognormal dostribution

... because they don't tend to draw the ones with really large $\sigma$, and if you don't work on the log-scale it can be hard to see much detail.

answered May 03 '18 at 06:50

Glen_b

257,508
32
553
939

After much experimentation and thinking, I figured it out satisfactorily. I think the documentation is very poor on this, as are most explanations mathematically. Imagine I am a geologist, and want to measure sreamline position (and changes) relative to some datum point that is 500 meters away, and I report everything in mm. I, perhaps, think a lognormal models this, with mu (in the real world) of zero, but stdev of 1% (5 meters). The challenge is going back into the underling normal RV that generates the LN. np.log(500000) – eSurfsnake May 03 '18 at 16:04
The challenge is going back into the underling normal RV that generates the LN, which has a mean of np.log(500000) or 13.12. Then, imagine what a one-sigma deviation would be, say, to 505,000mm. the ln of that is 13.13. So, my 'true' normal mas a mean of 13.13, and sigma of .10. That is why the scale number is so often close to 1 or less – eSurfsnake May 03 '18 at 16:15

can't make sense of scipy.lognorm

1 Answers1