0

I am fairly new to this and it has been more than a decade since I touched statistics at university, so please bear with me.

I am following an introductory book to machine learning and it has been so far a blast, really enjoying this and generally I can carry on or search my way through the math I have not touched in a decade. However while reading the author's solution to an exercise, I see he stated the following:

Let's look at the exponential distribution we used, with scale=1.0. Note that some samples are much larger or smaller than 1.0, but when you look at the log of the distribution, you can see that most values are actually concentrated roughly in the range of exp(-2) to exp(+2), which is about 0.1 to 7.4.

And then shows the following graphs:

enter image description here

Furthermore, later the author also states (About the reciprocal distribution now):

The distribution we used for C looks quite different: the scale of the samples is picked from a uniform distribution within a given range, which is why the right graph, which represents the log of the samples, looks roughly constant. This distribution is useful when you don't have a clue of what the target scale is:

And then this:

enter image description here

I know that log is the inverse of exp function, but I cannot see the "value" of showing this or which information the author is obtaining from this. This makes me feel confused, then angry and soon I may turn green and incredible, as I could not find any clear answer why this is done here. Even any textbook reference I should read to refresh these concepts (In case it is something extremely basic I am ignoring) would be greatly appreciated.

Navarro
  • 111
  • 3

1 Answers1

3

I agree with you that "log of a distribution" is an ambiguous term to use. There may be two things you could take logarithm of in this context: the values, or the probabilities. Both are used in different context, for different reasons. Both cases are described in-depth on our site, so for more details you can check the When (and why) should you take the log of a distribution (of numbers)? and Why are log probabilities useful? threads (see also other threads tagged as for even more examples where logarithms are useful). In the examples shown on the plots, it seems that what is shown are the log-transformed values, so it is transformation of random variable.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thanks a lot! There is a lot of information in those links. Also, you are right as the right-side graphs are log-transformed values of the left-side distributions. In the log, what does the X axis represent? – Navarro Aug 31 '20 at 18:49
  • 2
    The natural logarithms. You can tell because $\log(200000)\approx 12.$ – whuber Aug 31 '20 at 18:58
  • 1
    Now I see the light. Thank you so much @whuber – Navarro Aug 31 '20 at 19:00
  • in the pictures, why does the log of an exponential distribution become bell-curved, whereas the log of a reciprocal distribution becomes uniform looking, even though the original expoential distribution and the original reciprocal distribution look alike (right skewed)? – develarist Sep 01 '20 at 00:03
  • The log of exp-dist becomes bell-curved as most of the values of the exp are between 0.1 and ~7. Then the log forms that curve as $ln0.1$ is roughly _-2.3_ and $ln7$ is roughly _1.9_. For the second graph, I am not sure. Since values of the distribution range from 20 to 200000, the $ln$ of those ranges has values between _2.99_ and _12.2_. The logarithm of values up to ~50.000 grows quite fast to a value of _~10.8_ and then it flattens and grows very slow to a value of _~12_. This makes sense as it is how the logarithm function plot looks like. – Navarro Sep 01 '20 at 08:58