I'd like to compute a median of measurements taken from a population with 3 subgroups, A, B, and C. I'd like to the median to be "weighted", in the sense that each of the groups should have equal impact, regardless of the relative number of samples.
Apologies for not expressing this more formally, but hopefully an example will illuminate. Let the measurements of A, B, and C be: A={5,6,7} B={8,9} C={1,2,3,4}
The population median would simply combine all measurements: MEDIAN({A,B,C}) =MEDIAN({1,2,3,4,[5],6,7,8,9}) = 5.
But what I want to do is assume that A, B, and C are equally represented in the population, however, not "fairly" sampled. So the median I want here would be obtained by repeating measurements of A 4 times, B 6 times, and C 3 times (scaling each set to 12 elements - the LCM of their cardinalities).
WEIGHTED_MEDIAN ({A,B,C})
= MEDIAN({A,A,A,A,B,B,B,B,B,B,C,C,C})
= MEDIAN({5,5,5,5,6,6,6,6,7,7,7,7},{8,8,8,8,8,8,9,9,9,9,9,9,{1,1,1,2,2,2,3,3,3,4,4,4}})
=MEDIAN({1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,5,6,[6,6],6,7,7,7,7,8,8,8,8,8,8,9,9,9,9,9,9}) = (6+6)/2 = 6
Naturally, this is easy with very small sets. My question: is there a less expensive approach to compute this? If A, B, and C have even moderately large cardinalities that happen to be prime (e.g. 1009 ,1013, and 4919), the desired median would entail expanding A', B', and C' to each have cardinality of 5,027,793,523 - which is computationally absurd.
If there's not a direct simplification, is there an approximation that is reliably close? Would a weighted average of the median of each sub-group reliably give good approximations, or are there conditions under which it would skew heavily away from the "true" weighted median:
(MEDIAN(A)+MEDIAN(B)+MEDIAN(C))/3
= (6+8.5+2.5)/3
= 17/3
= 5.6667
~= 6
Two variations of this: Variation 1: how to handle if I know A, B, and C represent 20%, 20%, and 60% of the population? For my example, this would be equal to the median of repeating measurements of A 4 times, B 6 times, and C 9 times (resulting in set of 60 values with median 4). Variation 2: How to compute weighted percentile other than median, e.g. 25th percentile or 75th percentile of weighted results?