Log probabilities versus squared probabilities (entropy vs Gini)

Question

The advantage of log probabilities over direct probabilities, as discussed here and here, is that they make numerical values close to $0$ more easy to work with. (my question, instead of the links, focuses on why one measure, that doesn't use log probabilities, is widely approved in practice and preferable over a different measure, that does log them, despite the advantages of the latter)

The real advantage is in the arithmetic. Log probabilities are not as easy to understand as probabilities (for most people), but every time you multiply together two probabilities (other than 1×1=1), you will end up with a value closer to 0. Dealing with numbers very close to 0 can become unstable with finite precision approximations, so working with logs makes things much more stable and in some cases quicker and easier.

Basically log probabilities (which are used in Shannon entropy) are a work-around from naively multiplying probabilities together (as done with Gini measures).

Why then would Gini impurity (or Gini coefficient, which has a different formula) be preferable and more intuitive than Shannon entropy if it multiplies probabilities together?

$\textit{Gini}: \mathit{Gini}(X) = 1 - \sum_{i=1}^{n}p(x)_i^2$
$\textit{Entropy}: H(X) = -\sum_{i=1}^{n}p(x)_i\log p(x)_i$

Someone here said logarithms are too complicated to calculate, but I don't see how hard could it be, given that it is just a button on a calculator. And as said, log probabilities are more stable than multiplied/squared probabilities.

Note: the scope of my question is directed more towards non-classification problems dealing with the discretized histograms of continuous random variables, and real-valued numerical applications. but any explanation might be helpful

Please give **sources** for your quotations. Those who don't care can ignore the sources, but others will be helped by giving sources, say to follow up an interesting quotation or even to check on context or form a view of whether you are quoting a competent authority. — Nick Cox, Oct 22 '20 at 08:19

OmG · Answer 1 · 2020-10-21T14:04:12.183

2

From the side of computational complexity

Your sentence:

Someone here said logarithms are too complicated to calculate, but I don't see how hard could it be, given that it is just a button on a calculator. And as said, log probabilities are more stable than multiplied/squared probabilities.

is not accurate in terms of implementation and pragmatism of obtained methods to compute the final results in algorithms on the computer. You can find the complexity of computing logarithm and multiplication in this source. As you can see, the complexity of computing logarithm for an $n$-digit number is $O(M(n) \log{n})$ (if using the arithmetic-geometric mean iteration method), which $M(n)$ is the complexity of multiplication method that has been used in the computing of the logarithm method. Hence, at least the logarithm method is slower than the multiplication method by a factor $\log{n}$. Notice that it can be exacerbated when using multiple times in a computation like as you have mentioned.

edited Oct 21 '20 at 14:04

answered Oct 21 '20 at 13:57

OmG

1,039
10
13

thanks I needed to know. by the way, I don't see logarithm listed in the table in your first link. is it a special case hidden of one of the ones listed? – develarist Oct 21 '20 at 13:59
1

@develarist my pleasure. Please search "Arithmetic–geometric mean iteration" in the source of the link. – OmG Oct 21 '20 at 14:02
i hope someone can add an answer regarding the non-complexity side of the question – develarist Oct 21 '20 at 14:03
Do you think the advantage of the stable numerical precision introduced in log probabilities is not worth the price of its computational complexity? And that the faster calculation of non-log probabilities is in turn worth their numerical fragilitiy? – develarist Oct 21 '20 at 14:06
speed of computation is the main if not the only reason Gini coefficient is still popular today, for what I know. anyway, you should specify your application and its context if you want more details on the difference between the two – carlo Oct 21 '20 at 14:07
@carlo generally the italicized footnote in the question should suffice – develarist Oct 21 '20 at 14:07
fact is, it doesn't – carlo Oct 21 '20 at 14:08
@carlo I've edited it to "discretized histograms of continuous random variables" as opposed to classification/binary problems. Isn't Gini designed for the latter, but not the former? – develarist Oct 21 '20 at 14:11
2

I find this answer irrelevant to most statistical applications because (a) the asymptotic complexity of computing logarithms doesn't matter and (b) often those logarithms are obtained through initial data (such as an algorithm that directly returns the log of a probability) and from then on, only *addition* of logarithms is required (as in computing a log likelihood). – whuber Oct 21 '20 at 14:31
1

@develarist Gini coefficient was designed a century ago as a measure of heterogeneity. depending on what you use it for, applying it to binned numerical variables may or may give the same results as entropy, they are quite consistent metrics – carlo Oct 21 '20 at 17:17
@whuber it is relevant enough for random forest classification, which is probably the single most popular application of Gini coefficient – carlo Oct 21 '20 at 17:19
1

@Carlo Please explain how the *number of digits* in the computer's representation of numbers is relevant to the performance of random forests! – whuber Oct 21 '20 at 17:31
@whuber squaring is faster than taking the logarithm. by a factor of n, not that what's n is that relevant, I guess – carlo Oct 21 '20 at 17:34
1

@Carlo As I explained in an earlier comment, it is rare that computing logarithms is even an issue: in many, if not most, statistical applications the logarithms are already available and one is only taking linear combinations of them. Perhaps we are agreeing with each other, though, as your comment about the asymptotics suggests. – whuber Oct 21 '20 at 17:36
1

I think @carlo may be confusing different beasts both named for Gini. The measure here -- the formula is explicit -- is nothing to do with a measure widely used in studies of income inequality. Certainly the measure here depends on binning and whether the data come like that or the binning is the researcher's detail. Gini as applied to (e.g.) incomes doesn't require binning, although often the data do arrive binned. – Nick Cox Oct 22 '20 at 12:32
no I'm not confusing them, what made you think so? – carlo Oct 22 '20 at 12:51
1

Good. My guesses were (1) in many fields Gini coefficient is understood to mean a measure of (income) inequality (2) although there is good reason for naming the measure in the question for Gini that is not always done and many other names have been used. If those guesses didn't apply to you, fine, but the explanation might help some others. – Nick Cox Oct 22 '20 at 13:49
is the formula given in the question correctly called "Gini impurity", or is it a different Gini measure – develarist Oct 23 '20 at 04:24

Log probabilities versus squared probabilities (entropy vs Gini)

1 Answers1

From the side of computational complexity