1

Although this topic has been discussed on the thread numerous times, I have yet to read a convincing argument for why the mean is favoured over the median as a measure of central tendency. This is particularly prevalent in financial economics and for decision making under uncertainty, where the expected value and the expectations operator are widely used.

Is it due to efficiency? Does the mean have favourable properties? Or is it perhaps just used for computational convenience?

I may be mixing several things up, in asking this question, so if you could clarify any misunderstandings, I would be very grateful.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
Jaffar
  • 19
  • 2
  • 1
    I posted one likely duplicate. See also [this question](https://stats.stackexchange.com/q/200282/1352) and [this question](https://stats.stackexchange.com/q/2547/1352) and [this question](https://stats.stackexchange.com/q/6913/1352) and [other questions tagged both "mean" and "median"](https://stats.stackexchange.com/questions/tagged/mean+median?sort=votes&pageSize=50). – Stephan Kolassa Jul 26 '17 at 09:28
  • Besides case-specific pros and cons, I have a personal preference. Are you familiar with normal regression models - which model the conditional expectation- and quantile regressions -which models conditional quantiles (like the median). Perhaps it's a matter of training, but I find quantile regressions bothersome and complicated, even for the easiest cases. I guess the problem is non-continuity of the median – KenHBS Jul 26 '17 at 09:42
  • The most convincing discussion I've ever read about a preference for the mean over the median was in Cox and Hinckley's 1974 book, *Theoretical Statistics*. In their view, the mean is a *sufficient* statistic since it uses *all* of the information available from the data in its estimation. On the other hand, the median only uses two pieces of data, the min and the max. This fact renders the median "not sufficient." – Mike Hunter Jul 26 '17 at 13:12
  • https://stats.stackexchange.com/questions/27230/expected-value-for-discrete-nominal-variable/27245#27245 Most people use the mean without thinking about the alternatives. Depending on what you are interested in, there could be better measurements. See the link. – Jessica Jul 26 '17 at 20:51
  • 5
    @DJohnson: (1) Makes no sense to talk of a statistic's being sufficient *in general* - it's sufficient (or not) for inference about an unknown parameter indexing a family of distributions. The second example of a sufficient statistic in Cox & Hinkley, Ch. 2.2, p19, is one where the sample maximum is sufficient, not the mean. (2) I don't know what's meant by using "all of the information available from the data" to calculate the mean, but I'm quite sure you can't calculate the median from just the sample minimum & maximum - consider the samples $\{1,2,4\}$ & $\{1,3,4\}$. – Scortchi - Reinstate Monica Oct 08 '17 at 21:38
  • 4
    (3) Interestingly, it seems the median is never a sufficient statistic - see [When if ever is a median statistic a sufficient statistic?](https://stats.stackexchange.com/q/122917/17230). That's not the point though, its utility lies in its *robustness* to contamination or mis-specification of the distribution (C.& H., Ch. 9.4). – Scortchi - Reinstate Monica Oct 08 '17 at 21:40
  • @Scortchi Unfortunately, I don't have C&H's book available but it appears you do. You make a compelling, even precise case. So, I'll just have to take you at your word, for now. Given time, I can and will obtain their book from my library and try to dig up the specific pages where the points I referenced are made. That said, it may be that the ravages of time have not done my memory any favors. In other words, I could be completely wrong but would like to be able verify that this is, in fact, the case. – Mike Hunter Oct 09 '17 at 10:53
  • @scortchi It looks like we are both half right. My point about C&Y providing an excellent discussion of the meaning of *sufficient* statistics (as well as completeness, consistency and support) was correct but you are correct that they do not explicitly discuss the *sufficiency* of the mean (much less the median). On this webpage, the link between the *mean* being a sufficient statistic is made explicit (https://turing.une.edu.au/~stat354/notes/node55.html) >> – Mike Hunter Oct 16 '17 at 12:27
  • While this CV thread discusses the *median* in the context of *sufficiency* (https://stats.stackexchange.com/questions/122917/when-if-ever-is-a-median-statistic-a-sufficient-statistic – Mike Hunter Oct 16 '17 at 12:27
  • While this discussion is very enticing, I am still rather confused as why the mean remains to be so prevalent in the literature. https://stats.stackexchange.com/a/306864/171279 This post discusses how the mean minimises error if we take the euclidean norm, whereas the median minimises the error if we take the Manhattan distance, but this still doesnt explain why one would prefer the mean over the median. While it may be more convenient to use the mean, I certainly don't think that that is the only reason one might use it. – Jaffar Oct 17 '17 at 15:17
  • 1
    @DJohnson: C. &H. *do* discuss sufficiency of the mean - their *first* example is one where the mean is sufficient. (Similarly the web page you link to gives two examples, one where the mean is sufficient & one where it isn't.) Sure, when you're confident that a specified family of distributions indexed by the parameter $\theta$ is an adequate model for your data (so robustness isn't an issue), *and* a sufficient estimator for $\theta$ exists, *and* that sufficient estimator happens to be the mean; then it's hard to think of any reason to use the median to estimate $\theta$ - but ... – Scortchi - Reinstate Monica Oct 19 '17 at 21:42
  • 1
    ... in that case no-one argues you should. – Scortchi - Reinstate Monica Oct 19 '17 at 21:42
  • @Scortchi I'd be interested in a page reference where C&H discuss the sufficiency of the mean. Thx. – Mike Hunter Oct 21 '17 at 15:05
  • 1
    @DJohnson: Poisson Ex. 2.9, p. 19; binomial Ex.2.15 p. 23; normal Ex. 2.17, p. 25; gamma p.27. Basically just read Chapter 2. – Scortchi - Reinstate Monica Oct 23 '17 at 08:11

1 Answers1

2

The differences between the mean and median are:

First: The mean summarizes the information from all observations in the set of data to represent the centrality while the median depends on one or two observations to represent the centrality of a set of data.

Second: From the first point, the mean is affected by outliers and extreme values so it's value "if the data contain outliers" will not describe the actual centrality whereas the median is not affected by extreme values.

Third: We can't find the mean for a set of qualitative data while it is possible to describe the centrality for qualitative data by median if it is ordinal.

Fourth: Most tests and distribution depends on the mean as a measure of centrality to summarize or analyze data.

These are the simple differences which we can depend on to choose our measure of tendency

Noah16
  • 159
  • 11