3

If I know the distributional parameter of a distribution (lognormal, in this case) I can plot the density function. If I want to convert the density into a histogram, I should calculate the "height" of the bars associated with different intervals under the curve (i.e.,$ 0-99.9; 100-199.9$ and so on). How can I calculate the values of the density associated with each bin so that I can obtain the histogram associated with the distribution?

1 Answers1

4

Say that you have $n-1$ bins in form $(t_{k-1}, t_k]$, with boundaries $t_1, t_2,\dots,t_n$, then if you want to "histogrammize" your density function, simply take probabilities of $X$ belonging to $B_k = (t_{k-1}, t_k]$ bins

$$ p_k = \Pr(X \in B_k) = \int_{t_{k-1}}^{t_k} f(t)\, dt = F(t_k) - F(t_{k-1}) $$

where $f$ is a density function and $F$ is a cumulative distribution function, and divide them by the bin widths

$$ \hat f(x) = \frac{p_k}{t_k - t_{k-1}} \qquad \text{for } x \in (t_{k-1}, t_k] $$

for it to be consistent with the definition of histogram (Scott, 1992):

$$ \hat f(x) = \frac{1}{nh} \sum_{i=1}^n I_{(t_{k-1}, t_k]}(x \in B_k) $$

Similar approach is used to discretize the continuous distributions, to get their discrete analogs as described in the Implementing a discrete analogue to Gaussian function thread.

Scott, D.W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Brilliant. Thanks. Do you know if there is a way to do so in excel or stata? So that i can just plug in the values of distributional parameters and the bins' width and extract those values? –  Aug 07 '17 at 12:34
  • @a.russoIT you simply need to evaluate the CDF's on the endpoints of the bins, I believe both of the software can do it. – Tim Aug 07 '17 at 12:35
  • It helps a lot. I need to find a code to solve those integrals. My x goes from 0 to 250000. Assuming bins' width of 100 i should calculate 2500 integrals and it's not feasible. Any help for codes and software? –  Aug 07 '17 at 13:43
  • @a.russoIT but you *do not* need to solve the integrals! You need only the CDF's, i.e. in Excel `=LOGNORM.DIST(x_max; mean; sd; TRUE) - LOGNORM.DIST(x_min; mean; sd; TRUE)` – Tim Aug 07 '17 at 13:46
  • With your formula to use $F(x_i) - F(x_{i-1})$ for the bar heights, you describe how to construct a *bar chart*, not a histogram. The heights of the histogram's bars must be the *probabilities per unit interval* in each bin, given by $$h_i = \frac{1}{x_i - x_{i-1}}\left(F(x_i) - F(x_{i-1})\right).$$Although this might seem like just a theoretical nicety, it gets to the very concept of what a histogram is and what a density is; and because many users exhibit confusion about this, I urge you to modify and clarify your answer. – whuber Aug 07 '17 at 16:05
  • 1
    @whuber you are right, I corrected it. – Tim Aug 07 '17 at 18:23