3

I am just starting out with my stat basics and that’s where I came across measures of central tendency. In one of the books measure of central tendency is defined as a measure which yield information regarding the central or the middle part of a group of data. As we all know mean, median and mode are measures of central tendency. What I am not clear with is how is mode used to yield information regarding the the central part of a group of data. What if the the group of data has numbers which are occurring most other than in the central part? I am a bit confused here. Thanks for anyone’s help in advance

  • 1
    It wouldn't be a good measure of central tendency if you have a multimodal distribution or a distribution that is unimodal but highly skewed. The median is a good measure & the mean can be too. But the mean can be misleading for highly skewed or skewed & heavy-tailed. The mean, median & mode are all good measures for symmetric unimodal distributions that are not heavy-tailed. The Cauchy for example is symmetric & unimodal where the median & mode are the same but the mean does not exist. For Gaussian distributions the mean, mode & median are identical & good measures of central tendency. – Michael R. Chernick Nov 11 '19 at 05:40
  • 1
    @MichaelChernick - It depends on what you regard as central tendency and how central you want your measure to be. For a unimodal distribution (sensibly defined), the difference between the mean and mode is never more than $\sqrt{3} \sigma$, so it is not intrinsically bad – Henry Nov 11 '19 at 09:00
  • 1
    I think of central tendency means that the measure defines a point at or near the middle of the distribution. So the median is always a good measure. If the mean is not finite as is the case with the Cauchy it isn't appropriate to talk about it as a measure of central tendency. There are other examples where the mean & mode can be very different from the median which makes them inappropriate in those cases. When the distribution is multimodal you have a problem in defining "the" mode & you can have a bimodal distribution where neither mode is close to the middle. – Michael R. Chernick Nov 11 '19 at 14:38
  • Gugaa Srikanth and @MichaelR.Chernick [Relevant](https://stats.stackexchange.com/questions/96371/should-the-mean-be-used-when-data-are-skewed). Michael, I suspect you are confusing "does the mean behave like the median" for "does the mean measure central tendency". "Central tendency" is poorly defined, and the median's "boundary between upper and lower 50% of distribution" is just *one* way to define it, not The Right way. – Alexis Jul 26 '21 at 02:42

2 Answers2

3

Sometimes the mode is useful as a measure of the 'center' of a distribution. In a right-skewed distribution the mode is often smaller than the median, which is in turn smaller than the mean.

For example, consider a gamma distribution, specifically $\mathsf{Gamma(\text{shape}=\alpha = 5,\,\text{rate}=\theta=2)},$ which has mode $\frac{\alpha-1}{\theta} = 2,$ median $2.335454,$ and mean $\frac{\alpha}{\theta} = 2.5).$ There are convenient formulas in terms of the shape and scale parameters for the mode and mean, but in most cases the median must be computed by numerical integration.

qgamma(.5, 5, 2)
[1] 2.335454

If you have a moderately large sample from a population known to be gamma, but with unknown parameters, then the parameters can be estimated by the method of moments or by the method of maximum likelihood, from which the mean and median can be estimated. (See the Wikipedia link above.)

For a very large sample the population mean is well estimated by the sample mean, and the population median is well estimated by the sample median. The mode can be very roughly estimated by looking at the highest bar in a histogram, or more precisely estimated by finding a kernel density estimator of the sample.

Suppose we have a sample of size $n = 10,000$ from $\mathsf{Gamma}(5,2).$ Then the sample mean 2,506 and median 2.324 are reasonably good estimates of the population mean and median, as shown below.

set.seed(1111)
x = rgamma(10^4, 5, 2)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1407  1.6806  2.3244  2.5057  3.1290  9.2822 

Below is a histogram of these 10,000 observations along with a plot of the population density curve [solid black] and the a kernel density estimate (KDE) [dotted red].

hist(x, prob=T, col="skyblue2", main="GAMMA(5,2)")
curve(dgamma(x, 5, 2), add=T, lwd=2)
plot(density(x), col="red", lwd=2, lty="dotted")

enter image description here

The highest histogram bar is centered at about 2.25, and there are formulas for trying to refine this to get a slightly better estimate of the population mode (one of them here). (A slightly different choice of histogram bins might improve this estimate.)

The density estimator is made up of about 512 x-points and 512 y-points. The maximum of the KDE, which can be taken as the sample mode 2.082, can be found using $-notation in R as shown below. (I have used the default version of the KDE in R.)

kde = density(x)
kde$x[kde$y == max(kde$y)]
[1] 2.081567
BruceET
  • 47,896
  • 2
  • 28
  • 76
3

The notion that the mode constitutes a measure of "central tendency" is contextual at best, and hinges on the assumption that we are dealing with a distribution with a non-monotonic unimodal density. There are many distributions where the mode is in the tails of the distribution, or even at the extreme edge of its support. For example, the exponential distribution has its mode at zero, which is the extreme left edge of the support; not even close to a measure of central tendency. More generally, one can construct a distribution with any mean and mode one wishes, so it is entirely possible to construct a distribution with a mode that is absurdly far away from the "centre" of the distribution, as measured by the mean.

My view is that your misgivings here are correct --- you should not consider the mode to be a measure of central tendency unless you have strong a priori reasons to believe you are dealing with a non-monotonic unimodal distribution with a "hump" somewhere near the middle of the distribution. (And obviously judging this presupposes that one already has an a priori view of what constitutes the "middle".) It is best just to ignore this aspect of your textbook.

Ben
  • 91,027
  • 3
  • 150
  • 376