The commonly available spreadsheet function PERCENTILE / PERCENTILE.INC is described as giving the alpha quantile of a sample.
The 95th Percentile is described as the value below which 95% of the samples fall - [Wikipedia] - please correct it if wrong1.
This is commonly employed in "95th Percentile Billing" for services, which claims to "ignore sporadic usage spikes" - and is offered as a fairer (and cheaper) alternative to paying for a fixed level, but companies employ different methods of calculating it - some use p, and some 1-p. 1) Which is technically correct? p or 1-p?
Say I have a series of usage samples indicating transactions per minute of:
A1. 20
A2. 20
A3. 20
A4. 100
A5. 20
It's clear that 80% of the samples are 20 or below, ignoring the spike of 100. The function PERCENTILE(A1:A5,1-0.8)=20, which is what the lay customer would expect, whereas PERCENTILE(A1:A5,0.8)=36, which seems to not ignore this usage spike.
Also, consider the example:
A1. 20
A2. 20
A3. 100
A4. 100
A5. 20
PERCENTILE(A1:A5,0.8)=100 and PERCENTILE(A1:A5,0.95)=100 - which seems to clearly be the "upper 5%" while PERCENTILE(A1:A5,0.2)=20 seems more intuitively correct, and PERCENTILE(A1:A5,0.05)=15 defeating intuition as its lower than any value in the range.
2) How can I explain this lower value, to people who do not have a mathematical background - including myself? 3) Is either p or 1-p a technically correct value for p being the "lowest 95 percentile"? 4) Is there a formula that would most accurately average the lower 95% of samples? What is it called?
This could easily be a $1 000 000 / year question.