I was recently asked this question in an interview about when/where not to use median. I know median has some limitations such as it requires ordered list of numbers. But I am not sure if there is any other limitations that need to be taken into account?
Asked
Active
Viewed 43 times
2
-
1Because finding the middle of a set of numbers is straightforward (and does not even require sorting), the median does not require that the data originally be sorted: that's no limitation. Your interviewer was probably interested in your understanding of the *statistical interpretations* of the median of a *dataset.* For instance, your interviewer might have been looking for a reaction to information like https://stats.stackexchange.com/questions/2547 or, possibly, to see whether you understood the concept of [efficiency](https://stats.stackexchange.com/questions/16532). – whuber Sep 16 '20 at 15:51
-
2There could be a contextual argument. Economic data, for example, generally can have multiplicative errors leading to a skewed Lognormal distribution where the median rests below the average. If one is asked to construct national macro spending statistics for consumer products, projecting off of a sample average (and not a biased lower median value) would be more appropriate, albeit more noisy (less robust than the median). – AJKOER Sep 16 '20 at 16:53
-
Note: my response does answer the question asking for 'situations' where the median is possibly less appropriate. – AJKOER Sep 16 '20 at 17:02
-
Sometimes the mean is required to ensure "conservation of some integral quantity". For example a river cross section may be simplified as rectangular in a model. Given a fixed width, to preserve area the rectangle must have height = average depth. (See also @AJKOER comment for another case, in "accounting".) – GeoMatt22 Sep 16 '20 at 18:14
-
Would be more helpful to say interview for what kind of job. A more technical answer might have been 'trying to estimate the population mean from normal data'. Then the sample mean gives a more precise estimate of population mean. // @AJKOER's answer is better if interviewer is looking for basic statistical 'intuition'. Example: if you might want to know total payroll based on nr. or employees times "average" salary. In that case, "average" should be sample mean, not sample median. – BruceET Sep 16 '20 at 18:30
1 Answers
0
Comment continued with simulation:
Find sample means of 100,000 normal samples of size $n=15.$ $SD(A) = SD(\bar X) = \sigma/\sqrt{n} = 10/\sqrt{15} = 2.5820.$
set.seed(916)
a = replicate(10^5, mean(rnorm(15, 100, 10)))
mean(a); sd(a)
[1] 100.0052
[1] 2.590418
Do the same for sample medians: SD(H) \approx 3.19 > 2.58.$ [Because the same seed is used, this simulation uses exactly the same 100,000 samples.]
set.seed(916)
h = replicate(10^6, median(rnorm(15, 100, 10)))
mean(h); sd(h)
[1] 99.99552
[1] 3.190097

BruceET
- 47,896
- 2
- 28
- 76