I'm in agreement with the sentiment in Nick Cox's comment that this isn't a straightforward question.
I'm not sure that there really is an 'established method', but in this answer I will offer some approaches that could be attempted. I'm going to assume that we're dealing with data distributions rather than theoretical distributions.
Approach 1: Peaking Finding Algorithms
Scortchi's comment was onto something. Simply counting the local maxima of the probability/frequency distribution would accomplish quantifying multimodality. This can be done by the "Straightforward Algorithm" described in these notes from MIT. Nick Cox identified an issue that some of these local maxima may be due to chance (example: sampling error), or processing steps (example: binning). These variations may grossly inflate the number of modes identified.
There are other peak-finding algorithms that are more robust to noise and small artifacts than simply checking inequalities of immediately neighboring points. One way to do this is to have a threshold for how prominant a peak has to be to its neighboring values, which is done in this SciPy implementation.
Approach 2: Clustering Analysis
Cluster analysis is itself a complicated subject. There are not only lots of clustering algorithms, but also lots of types of clustering. I cannot hope to give clustering a complete treatment here, but it is essentially about grouping data points together based on some notion of how similar or disimilar they are. Grouping the datapoints is often meant as "partitioning of the dataset, although there are exceptions like fuzzy clustering algorithms which allow data points to be members of multiple groupings of data to at least some extent. How to measure the similarity or disimilarity of data points is also a topic that could lead to extensive discussion, but basically you need a binary operation whose image elements can be used to create a partial order on the data. Some algorithms determine the number of clusters, while others have the number of clusters as a hyperparameter that can be tuned with cross-validation.
How clustering relates to multimodality is in the number of groupings. While some algorithms don't partition the data, those that do partition the data will have a number of groupings that can be considered a loose measure of the number of 'substantial modalities'. This will be especially true for distance-based or density-based clustering where points being closer together will likely be put into the same grouping.
I caution that specific clustering methods should be considered, and that using the first clustering algorithm you happen to come across may not be useful for this purpose.