Quantification of "modality" of distribution

Question

Does anyone know of a quantification of the modality of a distribution?

For example, exactly how "high" must the "second" peak be in order to qualify as bimodal rather than unimodal with some un-smooth regions? what about tri-modal?

I have discrete distributions in mind, but the question is general.

I am looking for a quantification because I would like to programmatically "classify" the modality (uni-, bi-, tri-, 4+) of thousands of distributions, and then tally up how many e.g. bimodal distributions I have observed.

I have invented some measures myself, but of course I would like to use an established method, if one exists.

As the number of modes is clearly an answer, the question becomes how do you recognise a "mode" as worth flagging as such? That is no easier. I conjecture that there is no accepted answer. Experienced data analysts tend to be much more cautious than beginners, as they have a stronger sense that apparent modes can be chance fluctuations and/or side-effects of choices in binning or smoothing. I know the question is how to automate recognition, so such comments may seem to miss the point entirely, but saying that this is very difficult to do well is within what a comment can say. — Nick Cox, Jun 22 '15 at 13:33
The second peak merely needs to be higher than some value between it & the first peak to qualify the distribution as being (at least) bimodal. So are you really interested in inference from the modality of a sample to the modality of the population from which it was drawn? — Scortchi - Reinstate Monica, Jun 22 '15 at 13:35
It is important here to distinguish between theoretical models and data. The wording used in your question suggests that you are interested in theoretical models (distributions), but then the answer is fairly trivial: a density mode is a mode and can be counted (only neighbouring density values at the same height may cause ambiguity). You may in fact rather talk about observable distributions of data, but then the first thing you need to understand is that the standard definition of modality refers to the underlying distribution, *not* to observed data. — Christian Hennig, Jul 13 '21 at 11:33
When it comes to observed data, the first thing is that there needs to be a problem definition. Do you want to classify every data distribution as unimodal for which unimodality of the underlying distribution can not be rejected, same for bimodal etc.? Even then it would depend on the test statistic you use. — Christian Hennig, Jul 13 '21 at 11:35
Many similar questions, have a look at https://stats.stackexchange.com/questions/138223/how-to-test-if-my-distribution-is-multimodal, [here is a list of posts](https://www.google.com/search?q=testing+if+a+distribution+is+bimodal+site:stats.stackexchange.com&client=ubuntu&hs=dP2&channel=fs&sxsrf=ALeKk01tp3I1JHZqi1qbwWJ0kr9ynY2iAQ:1626186452215&sa=X&ved=2ahUKEwjn8cnOoODxAhVvrJUCHdPnDokQrQIoBHoECB4QBQ&biw=1232&bih=645) — kjetil b halvorsen, Jul 13 '21 at 14:36

score 2 · Answer 1 · answered Feb 10 '21 at 05:21

I'm in agreement with the sentiment in Nick Cox's comment that this isn't a straightforward question.

I'm not sure that there really is an 'established method', but in this answer I will offer some approaches that could be attempted. I'm going to assume that we're dealing with data distributions rather than theoretical distributions.

Approach 1: Peaking Finding Algorithms

Scortchi's comment was onto something. Simply counting the local maxima of the probability/frequency distribution would accomplish quantifying multimodality. This can be done by the "Straightforward Algorithm" described in these notes from MIT. Nick Cox identified an issue that some of these local maxima may be due to chance (example: sampling error), or processing steps (example: binning). These variations may grossly inflate the number of modes identified.

There are other peak-finding algorithms that are more robust to noise and small artifacts than simply checking inequalities of immediately neighboring points. One way to do this is to have a threshold for how prominant a peak has to be to its neighboring values, which is done in this SciPy implementation.

Approach 2: Clustering Analysis

Cluster analysis is itself a complicated subject. There are not only lots of clustering algorithms, but also lots of types of clustering. I cannot hope to give clustering a complete treatment here, but it is essentially about grouping data points together based on some notion of how similar or disimilar they are. Grouping the datapoints is often meant as "partitioning of the dataset, although there are exceptions like fuzzy clustering algorithms which allow data points to be members of multiple groupings of data to at least some extent. How to measure the similarity or disimilarity of data points is also a topic that could lead to extensive discussion, but basically you need a binary operation whose image elements can be used to create a partial order on the data. Some algorithms determine the number of clusters, while others have the number of clusters as a hyperparameter that can be tuned with cross-validation.

How clustering relates to multimodality is in the number of groupings. While some algorithms don't partition the data, those that do partition the data will have a number of groupings that can be considered a loose measure of the number of 'substantial modalities'. This will be especially true for distance-based or density-based clustering where points being closer together will likely be put into the same grouping.

I caution that specific clustering methods should be considered, and that using the first clustering algorithm you happen to come across may not be useful for this purpose.

Quantification of "modality" of distribution

1 Answers1