The mean as useful descriptor of the process creating the distribution
Often the mean is of interest because often it relates to parameters of the underlying process that is described by the distribution.
This can also be true for skewed distributions like the Poisson distribution where the mean is equal to the rate parameter.
On the other hand, in the case of a bimodal or multimodal distribution you are often dealing with a mixture of distributions, each with their own mean. In that case the mean of the mixture is not a very useful descriptor that helps to understand the distribution.
The mean as useful in application of the distribution.
A case that the mean might still be useful, even when it has little to do with the mechanics behind the process creating the distribution, is when the mean plays a role in the application.
For instance, if your application involves a sum of variables, then the distribution of the sum is of interest (and this will follow approximately a normal distribution with a single mode, centered around the mean).
Example: Say the distribution is for how much food to buy for the buffet on a cruise ship and the bimodal distribution describes the eating patterns of the individuals on a ship, then the distribution of the sum is of interest.
An example highlighting the difference between the two cases from the split in this answer are the different cost functions involved in optimization (one cost function for the fitting procedure, and one cost function as the actual optimization target). For instance, the mean might be desired for an application (e.g. it minimizes the squared error loss function) but the median of a sample from the distribution can be a better estimator of the distribution shape: http://stats.stackexchange.com/a/492143
An analogy with the usefulness of the mean to describe a distribution, when it is about the application, is the centre of mass in physics. Say you want to describe the motion of asteroid in the solar system then the exact shape of the asteroid is not much important and we make computations with the centre of mass. (there are some effects that make the shape a little bit important, e.g. tidal forces and radiation pressure). In the same way for statistics, the centre of probability mass (the mean) may not describe well the shape of some probability distribution, but it could be the only thing that matters in the application.