0

Say I have a histogram which contains binned data with steps every $n$ units where $n$ is a random integer. This data is sampled from a population which is distributed normally (or some other distribution, if there is a more general form of this solution). How can I make the best estimate of the number of datapoints at a resolution of 1 unit (for instance, integrating from the previous 0.5 to the next 0.5) while adjusting so that the total area sums to the binned data, which we know is the "real world truth"? We should be able to incorporate information from the shape of the distribution to better our estimate from simply randomly distributed within the bin, but is there a way to further adjust it so that we don't make a guess which contradicts our measured data, or to minimise this difference?

Some data, the weight of animals, with an assumed normal distribution:

$Weight - Headcount\\ 0-600lb: 340,000\\ 600-699lb: 365,000\\ 700-799lb: 494,000\\ 800-899lb: 430,000\\ 900-999lb: 110,000\\ 1000-3000lb: 40,000$

Goal: estimate the number of animals at $1lb, 2lb...3000lb$

Any solution would be helpful - if anyone knows how to approach this kind of problem where the bins are regular, or any other simplifying assumption is made, it would still be very helpful! Thank you :)

Hiraphor
  • 11
  • 2
  • Your Normal distribution assumption implies you can obtain an answer once you have estimated two parameters. These are usually taken to be the mean and SD. See https://stats.stackexchange.com/questions/60256/ for a standard way to estimate them from binned data. Your requirement that the estimates agree with the "real world truth" is self-contradictory: either you can *interpolate* within the bins or you can *estimate* the underlying distribution; you usually cannot do both in the same way. Which of those is your actual objective? – whuber Jun 22 '21 at 13:07
  • The aim is to interpolate within the bins. I don't understand why the population's normal distribution means I can obtain an answer; I just want to use all the available information to get the best estimate with the required precision. A normally distributed population tells me the data sample is likely to be decreasing from a maxima, for instance, and my data has a maxima. However, a fitted distribution may under or overestimate a bin - this is the problem which I need to correct for in some way if that approach is used – Hiraphor Jun 22 '21 at 13:45

0 Answers0