2

How does one go about averaging multiple histograms of a quantity $x,$ if the individual histograms do not have the same range, that is, $R_i=\text{max}(x_i)-\text{min}(x_i)$ will be different for two different $i$ values (where $i$ here is indexing over the realizations of the experiment/histogram). Only common point between them is that the bin size is fixed, $b=1.$

How can I perform the averaging whilst making sure that no discrepancy arises for values of $x$ that do not occur in all histograms (as $R_i$'s can be different)? I am interested solely in the methodology here.

user929304
  • 231
  • 2
  • 13
  • 1
    Please explain what the mathematical process of "averaging histograms" is intended to represent. Only then would we have a definite basis to recommend an appropriate methodology. – whuber Jun 21 '16 at 13:11
  • @whuber I repeat an experiment $5$ times, with the same exact conditions each time. Each of them gives me a different histogram for the system temperature. From the $5$ together, I intend to compute a single, average, histogram. – user929304 Jun 21 '16 at 13:15
  • 4
    (1) Why not combine your datasets into one and draw the histogram of that? (2) Where they overlap, do your histograms use the same breakpoints or not? (3) Is the amount of data exactly the same in each experiment? – whuber Jun 21 '16 at 13:34
  • 1
    @whuber Actually thought of doing your suggestion (1) at first, don't know why I didn't go through with it, I'll go do it now. I guess that counts as an averaging of some sort. (2) yes, (3) no there are different number of datapoints gathered from each experiment. – user929304 Jun 21 '16 at 13:38
  • That's good--(1) is your best option. Before you go any further, though, you would appreciate the related information at http://stats.stackexchange.com/questions/51718 . – whuber Jun 21 '16 at 13:40
  • @whuber Many thanks for your assistance, I will also look into the linked post. – user929304 Jun 21 '16 at 13:45

1 Answers1

1

So I've had a look around and this is the best answer I have found: http://se.mathworks.com/matlabcentral/answers/59865-how-to-combine-different-histograms

From what I understand you have to adjust the range your bins cover in every histogram so that there is a universal range for them all; $R$ instead of $R_i$. I think you can do this by creating empty bins in each histogram. Then you can just add them together and take the average normally (by dividing by the number of histograms).

Hope that helps.

EhsanF
  • 371
  • 1
  • 9
  • 3
    Thanks for your answer. Well this would be probably (honestly I'm not sure) problematic, as introducing 0's into the dataset would highly underestimate the uncommon values between histograms, right? for example if we have $x=10$ in only the histogram $i=2,$ with frequency ($f(x)=100$) where in total we have only 5 histograms, then in the averaged histogram $10$ comes with a frequency $20.$ But is that ok? – user929304 Jun 21 '16 at 12:58
  • Then maybe, and this is a MAYBE, you could try this approach which I think would be similar to taking the weighted average of every bin: You take every bin and combine them and divide by the number of histograms that bins has been in. For example if you have $f(x_1 = 10)=100$ and $f(x_2 = 10) = 20$, you take the average of them and have $f(x_m = 10) = \frac{100 + 20}{2}=60$. Similarly if you have $f(x_i = 5) = 80$ for all $i = 1,2,3,4,5$ then you have $f(x_m = 5) = \frac{80+80+80+80+80}{5}=80$. Does this sound logical to you? – EhsanF Jun 21 '16 at 13:34
  • But looking at this answer now, it would be bad if $f(x=10) = 100$ only occured in one of the bins, then you would have a bias in the sense that $x=10$ will have a bigger bin even though more $x=5$ have occured. I think the way whuber is describing above could be best, to just combine your dataset and make a new histogram. – EhsanF Jun 21 '16 at 13:37
  • Sure, I'll go with whuber's suggestion then, I appreciate your effort. Thanks again. – user929304 Jun 21 '16 at 13:45
  • No worries at all, glad to have been able to 'kinda' help haha. – EhsanF Jun 21 '16 at 13:50
  • 2
    Notice that (later) the OP indicated the various histograms represent *different* amounts of data. Since a standard histogram plots frequency *density* (not frequency itself--that would just be a bar chart), simply adding the values as recommended in your reference would be incorrect. – whuber Jun 21 '16 at 13:59
  • Ah yes of course! Thank you for pointing that out whuber! – EhsanF Jun 21 '16 at 14:20
  • 1
    @whuber right, but if I understand correctly as long as we don't normalize the histogram (by sum of all frequencies), it is just a bar chart of frequencies (as the one you point to), right? – user929304 Jun 21 '16 at 15:00
  • Actually yeah, and the bin widths are 1 so there might be different amounts of data in each histogram, but frequency density x width is the same for all histograms and width would always be 1. – EhsanF Jun 21 '16 at 15:24