While browsing for information on how I might plot a fitted normal curve over a histogram, I found the following:
http://www.statmethods.net/graphs/density.html
There is a line I don't fully understand, though I recognize that it really does work:
yfit <- yfit*diff(h$mids[1:2])*length(x)
Here, yfit
is initially a list of values drawn from the pdf of an inferred normal distribution at regular intervals along the x-axis, length(x)
is the number of observations in a list x
from which a histogram was prepared, and diff(h$mids[1:2])
is the difference between the midpoints of the second and first bars of said histogram on the x-axis. After this statement is run, yfit
becomes itself multiplied by those other two terms.
I understand that multiplying by length makes sense as this turns values for a probability distribution function into number of observations around each respective value—taking into account that a continuous pdf is being used here and the number of observations at any single point is zero.
I don't understand why it is necessary to multiply by diff(h$mids[1:2])
to get the right outcome in the graph, although I can confirm that it does get the right outcome.
Does anyone have an explanation?