We have data that will get aggregated per hour into the following values
- Q1
- Median
- Mean
- Q3
- Standard Deviation
- Max
- Min
- Count of Values
So the data will look more or like this in the end.
00:00-01:00 01:00-02:00 02:00-03:00 03:00-04:00 ...
--------------------------------------------------------------------------------
Q1 68,72 69,64 64,31 64,40 ...
Median 118,72 124,42 115,54 118,11 ...
Mean 119,17 119,97 117,23 117,60 ...
Q3 169,64 171,72 170,63 168,72 ...
StDev 59,30 59,15 61,23 59,62 ...
Max 219,70 219,44 219,76 219,71 ...
Min 15,02 15,07 15,05 15,05 ...
Count 1000,00 1000,00 1000,00 1000,00 ...
Now we want to aggregate the same values for a whole day (24h) without using the original data if possible (because in our real scenario it would require a significantly longer time to aggregate from those).
For most of them it's pretty straight forward, like MIN is simply the overall MIN, AVG is the overall AVG, etc.
But the tricky part is Q1, Median, Q3 and StDev.
From what I understand it's not possible to simply calculate the (weighted) average value of the 24 separate values. But is there a method to achieve this from already aggregated values (for example by storing some additional data)?
Is the difference from such a huge dataset even significant?
Or will the data always be distorted except for calculating it from the whole dataset?