1

We see Benfords Law in a lot of real world data sets, with the general derivation that if things are distributed symmetrically on a log scale, then the law holds. However, it's not obvious to me:

  1. Why is the case that most real-world data would be distributed on a log scale instead of a linear scale? What processes do we have to be aware of that produce these different kinds of datasets?
  2. Furthermore, we tend to assume the prior (especially in machine learning) that normal distributions are always over the linear scale, but if we analyze the data set and notice that benford's law holds, then is it a better prior to assume that the data is normally distributed on a log scale instead? Is it the case that people tend to model any distribution as normal, when in reality, it is actually log-normal (which I assume means the same as normal over a log scale)?
  3. Nicholas Taleb, in Black Swan, discusses how people tend to assume a normal distribution when in reality, most things are on a log/power law scale. Is assuming a normal instead of a log normal distribution a good example of this?
  • 3
    What does things are "distributed symmetrically on a log scale" mean? Benford's law often comes close to being satisfied to being by distributions widely dispersed over several orders of magnitude, which is not really the same thing. But for example a random variable with a log-normal distribution concentrated on a narrow range (e.g. with underlying $\sigma$ much closer to $0$ than to $1$) meets your criterion but will not satisfy Benford's law – Henry Dec 26 '20 at 02:38
  • 1
    People have observed that [Benford's Law](https://en.wikipedia.org/wiki/Benford%27s_law) for first digits applies to a daily list of stock prices--whether expressed in dollars, pounds, euros, or yen. Obviously, _individual_ first digits will change upon converting from one currency to another, But the log-nature of Benford's law is such that the _distribution_ of first digits will not change because of such a conversion. – BruceET Dec 26 '20 at 05:02
  • Benford's Law arises when data are spread across so many orders of magnitude that the "wrapping" imposed by focusing on the first significant digit creates a near uniform distribution. For an example of this, see the last method I describe at https://stats.stackexchange.com/a/117711/919, where I analyze the wrapping of a Normal distribution. Normality is not necessary for the result, but serves only as a helpful frame of reference to analyze what's going on. – whuber Dec 26 '20 at 13:20

0 Answers0