5

I have two random variables (say x1 and x2) defined by empirical probability distributions, and would like to calculate the median of their sum.

Under what circumstances (in terms of the distributions of x1 and x2) can I assume the median of the sum is equal to the sum of the medians i.e.

median(x1) + median(x2). (1)

The alternative approach I've used is to randomly generate large samples of x1 and x2 and then calculate the median as

median(sample of x1 + sample of x2). (2)

Approach (1) is quicker and I need to do this calculation many times. Under what circumstances is approach 1 approximately correct? Are their alternatives to my second approach?

I've seen this Q&A What does it mean if the median or average of sums is greater than sum of those of addends?

---- Additional information after reading the comments

If we have two normally distributed random variables then median of the sum is approximately the sum of the medians

N1 <- rnorm(10000, mean = 1, sd = 0.1)
N2 <- rnorm(10000, mean = 0)

# We expect an answer of 1 and get close

median(N1) + median(N2) #[1] 0.9918688
median(N1 + N2) #[1] 0.9962555

This doesn't work for exponential variables

set.seed(2002)
e1 <- rexp(100000, 1)
e2 <- rexp(100000, 1)

median(e1) + median(e2) # expect 2* log(2) = 1.386 and get 1.374
median(e1 + e2) # expect 1.678 and get 1.668

So, looking at @glen_b's comment, is symmetry the sufficient condition that would allow the assumption that the median of the sum is the sum of the medians?

Tony Ladson
  • 668
  • 7
  • 14
  • Because the median is a non-linear function, the median of a sum is never equal to the sum of medians – Repmat Jul 29 '19 at 08:53
  • 2
    @repmat Never? that's demonstrably untrue – Glen_b Jul 29 '19 at 09:12
  • 3
    @Tony you should clarify exactly what you mean by "sample of x1 + sample of x2". $\:$ There are a number of sufficient conditions. If you have two independent symmetric random variables, then the median of their sum will be the sum of the medians -- unfortunately empirical cdfs are very rarely symmetric. Another case would where the two variables are [comonotonic](https://en.wikipedia.org/wiki/Comonotonicity) (because of quantile-additivity) but it's not clear that you're talking about sampling their joint distribution (i.e. sampling pairs of $(x_i,y_i)$ together ... ctd – Glen_b Jul 29 '19 at 09:21
  • 1
    ctd... by drawing the index $i$ at random); it sounds like you mean to sample them separately as if they were independent. – Glen_b Jul 29 '19 at 09:21
  • Thank you @Glen_b. I had been sampling them independently and I think that is ok for these variables. I've added a further thought to my question. It seems, that for my purposes, symmetry is the condition I need to be concerned about. – Tony Ladson Jul 30 '19 at 05:44
  • 1
    But (1) you won't get exact symmetry with an ecdf unless you're astoundingly lucky (its useful for population disttributions, but only approximate at best with samples from them), and (2) you could get additivity without symmetry – Glen_b Jul 30 '19 at 07:19
  • @Glen_b You are right, I don't have symmetry. Under what other circumstances could I get additivity? Is that the comonotonic condition you mentioned? - which isn't relevant in my case. – Tony Ladson Jul 30 '19 at 11:20
  • 1
    I was suggesting that there would be other conditions than either of the ones I mentioned. I don't have a characterization for you, but examples that are neither symmetric nor comonotonic are easy to make, so those two are not a complete list. – Glen_b Jul 30 '19 at 12:08

3 Answers3

5

Actually my comment is not entirely correct, allow me to clear up;

The median of a series of numbers $X$ is calculated by ordering all the numbers from smallest to largest, then finding the number in the middle. This means that when you change the numbers in $X$ you also change the ordering, hence the median changes. Therefore (in general) you can almost always assume that: $$ \text{MED}(X + Y) \neq \text{MED}(X) + \text{MED}(Y) $$ However there is at least one exception, whenever the ordering of $X$ (after adding $Y$ to $X$) does not change neither does the median. For instance if all numbers in $X$ and $Y$ are the same, see this example (written in R):

set.seed(42)
n <- 100 
x <- rnorm(n)
c <- x
y <- rnorm(n)

median(x+y)           # 0.0767433
median(x) + median(y) # 0.02050838
median(x + c)         # 0.1795935
median(x) + median(c) # 0.1795935
Repmat
  • 3,182
  • 1
  • 15
  • 32
  • 1
    The OP does ask about when _approximation_ is close enough, and clearly it can be seen in your example, that with increasing ``n``, the difference between ``median(x+y)`` and ``median(x)+median(y)`` approaches 0 (?) at some rate. Same can be seen with other symmetric pdfs, as @Glen_b states in the comments. – runr Jul 29 '19 at 12:53
3

For continuous variables the following are equivalent

$$\text{M}(X + Y) = \text{M}(X) + \text{M}(Y) \\ \iff \\ \mathbb{P}[(X-\text{M}(X)) > -(Y -\text{M}(Y))] = \mathbb{P}[(X-\text{M}(X)) < -(Y -\text{M}(Y))] $$ You can imagine this geometrically from the joint distribution of X and Y. Half the mass needs to be on either side of the line $x+y=\text{median}(X)+\text{median}(Y)$ (or equal masses for discrete variables).

This means that for two random x and y, the probability for x to be further above the median of X than y is below the median of Y equals, the probability for x to be less above the median of X than y is below the median of Y.

Sextus Empiricus
  • 43,080
  • 1
  • 72
  • 161
2

Comment: This is parallel with other comments, but it might give you a quick way to check whether one variable increases precisely when the other does.

If Spearman correlation between x and y is $1,$ I believe the sum of medians is the median of the sum. In R:

x = rexp(100);  y = sqrt(x)
median(x+y)
[1] 1.598729
median(x)+median(y)
[1] 1.598729
cor(x,y, meth="spearman")
[1] 1

The other situation (approximate) discussed in comments is symmetry:

 u = runif(100); z = rnorm(100)
 mean(u+z);  median(u+z)
[1] 0.5401409
[1] 0.5229718
mean(u)+mean(z)
[1] 0.5401409
median(u)+median(z)
[1] 0.5866283
BruceET
  • 47,896
  • 2
  • 28
  • 76
  • 2
    So basically, whenever y is a positive linear function of x, this is true. Makes sense intuitively as well.. – ChinG Jul 30 '19 at 02:52
  • @ChinG it can also work with zero correlation. For instance if X and Y are independent normal distributed variables. – Sextus Empiricus Oct 11 '21 at 06:49