1

I'd like to calculate the mode for the frequency distribution table:

 .5  - 4           9
 4   - 7.5        11
7.5  - 11         11
11   - 14.5        7
14.5 - 18          2

Thanks!

BruceET
  • 47,896
  • 2
  • 28
  • 76
hasan
  • 11
  • 2
  • A simple method is to average the values tied for mode, and although they are here intervals, 7.5 is still what I would report on this information. An easy but necessary remark is that I would prefer to see the original data. – Nick Cox Sep 26 '21 at 21:17
  • https://stats.stackexchange.com/questions/176112/how-to-find-the-mode-of-a-probability-density-function includes a method for estimating the mode directly. Using e.g. a kernel density method raises problems of dependence on kernel shape and width that can be almost as severe as those arising from histogram binning. – Nick Cox Sep 26 '21 at 23:58
  • I'd attempt a wry non-technical summary: Modes are likely to be of some interest and use precisely when it is clear what would be a good estimate of the mode, as a really clear peak on a graph of the distribution. Modes are a bit of a dead end for analysis, but can be some help descriptively. – Nick Cox Sep 27 '21 at 09:23

1 Answers1

0

Various authors give different definitions for modes of (ungrouped) samples. There is agreement that sample 1,1,2,2,2,2,3,5 has mode 2. Some would say that sample 1,1,2,3,4,4,5,6 has no (unique) mode, and some would say the sample has a double mode at 1 and 4. Some would say that sample 1,1,2,3,4,4,4,6 has mode 4 and some would say that the sample is bi-modal with major mode at 4.

For data grouped into intervals of equal length, the 'modal interval' is defined as the interval that has the highest frequency (if there is one). Then various formulas are given for finding 'the mode' within the modal interval.

In your case I would be tempted to say that it makes sense to regard the boundary $7.5$ between the two intervals with frequency $11$ as the mode of the grouped data, but you should check the exact wording in your text or class notes to see what definition is being used in your class.

One purpose of identifying the 'mode' of a sample may be to estimate the mode of the population distribution. For a discrete distribution the mode is the value (if it exists) of the most probable value. For a continuous distribution the mode is at the unique point $x$ (if there is one) at which the density function $f(x)$ reaches its maximum value. (Some texts allow for multiple modes.)

Examples: The mode of the distribution $X\sim\mathsf{Binom}(n=4,p=1/2)$ is $x=2$ because $P(X = 2) = 0.375$ has the largest probability. The mode of the distribution $Y\sim\mathsf{Norm}(\mu=50,\sigma=7)$ because the maximum of the density function as at $y=50.$ Some texts would say that $\mathsf{Binom}(5, .5)$ has a double mode at 2 and 3 and that $\mathsf{Beta}(0.5, 0.5)$ has a double mode at $0$ and $1$ because the density function $f(x)$ approaches $\infty$ as $x$ approaches these two values.

Sometimes the purpose of identifying the mode of a sample is to estimate the mode of the population distribution from which the sample was taken. For example, if we have $n=1000$ observations from $\mathsf{Norm}(\mu=50,\sigma=5),$ we have the following, using R:

set.seed(2021)
x = rnorm(1000, 50, 7)
mean(x)
[1] 50.08926
hist(x, ylim=c(0,300),label=T,col="skyblue2",
     main="Sample of 1000 from NORM(50,7)")

enter image description here

Because the mode of a normal distribution is the same as its mean, the best estimate of the mode is the sample mean $50.08926.$ If you tried to use the histogram, you can see that the modal interval is $(50,55]$ with frequency 252. It is unlikely that any formula for estimating the modal value within that interval would give as good an estimate as the sample mean. Of course, the population mode is exactly $50.$

Technical note: In R, it is possible to get a reasonable estimate of the density function of a population from a sufficiently large sample. There are various styles of density estimators. (Roughly speaking, you can think of a density estimator as a 'smoothed' histogram, but it's not based on the histogram shown above.)

Below we show R's default density estimator based on the sample above. Its maximum is at $50.73,$ which is not much different from the sample mean $50.09.$

de = density(x)
est.mode=mean(de$x[de$y==max(de$y)]); est.mode
[1] 50.73098
plot(de, ylab="Density", xlab="x", 
     main="Density Estimate")
 abline(v=est.mode, col="red")

enter image description here

BruceET
  • 47,896
  • 2
  • 28
  • 76
  • 1
    If you found this answer helpful, then please consider [upvoting](https://stats.stackexchange.com/help/why-vote) and/or [accepting](https://stats.stackexchange.com/help/accepted-answer) it. – kjetil b halvorsen Oct 09 '21 at 14:07