My company wants to track request latencies for the project I'm working on.
Specifically, the report I'm building should have the 95th and 99th percentile values of the latencies over all events.
However, there's an intermediate summarization step that occurs during the processing, and I'm not quite sure what calculations I need to do at the intermediate step to provide accurate percentiles at the final aggregation step.
The primary data is organized in a set of sessions S, where each session has a series of timestamped event pairs (e1, e2).
I need to summarize the latencies (i.e., the timestamp differences from the event pairs) into a small set of numbers (vectors are not available at the intermediate step) for each session S.
In other words, I have sessions $S_{1..n}$, where $S_{k}$ has event pairs $ep_{k1..n}$.
I want the 95th percentile values of the timestamp differences of all event pairs, but I can't directly access all the event pairs in the final calculation; the final calculation can only access a set of summary values calculated per session.
I can perform some arbitrary computation on the event pairs contained in each session, but I cannot have a vector of values in the session summary.
I hope this is clear enough; I'm a statistics novice.