0

I have a time series data and I want to calculate p90 percentile. Because there's a huge amount of data I need to split my calculations somehow.

I was thinking that maybe could calculate the percentile of every day data and then , using that, calculate every month, and then using months, the whole year but ¿Is this calculation method correct? I think is not, in fact to solve the problem exactly I should use all the data to calculate the year percentile, and every time I get a new data I should recalculate

¿Do you know how to solve/approximate this problem?

This problem is different from continuous P2 approach because I'm not only interested in the final percentile I want the daily, the monthly, trimestral an anual. For example, if I use P2, when I get the January percentile and continue the calculation with February it is not possible to get the isolated February data isn't? I will get the January-February. It will be great to work somehow with monthly percentiles (for example) to get the annual one.

  • Something similar to median of medians? – user2974951 Aug 07 '19 at 11:35
  • The P2 approach works beautifully in your case. Why do you think it will fail? If you want monthly percentiles, for instance, then just compute percentiles separately for each month. There's no increased computational cost in doing so. But the entire point of the P2 algorithm is that there isn't any way to combine monthly percentiles to get an annual percentile unless you track much more information as you go along. Studying and understanding the answers to the duplicate will make all this clear. – whuber Aug 08 '19 at 11:05
  • thanks @whuber I will do that. Do you know of any matlab implementation? I couldn't find any. All the best – David Santos Santos Domínguez Aug 08 '19 at 11:26

0 Answers0