Why we use squared probabilities in the gini impurity

Question

Why we are using squared probabilities instead of normal probabilities in gini impurity . Probabilities will always be positive , so why to square those ?
Any leads would be highly apriciated , Thanks in Advance.........

Gini impurity [does not](https://victorzhou.com/blog/gini-impurity/#recap) use squared probabilities, so it is unclear what you mean? — Tim, Aug 03 '20 at 23:32
@Tim The Gini Impurity formula that you linked to can be rewritten as $1-\Sigma_i^C p(i)^2$, which does use them. — dimitriy, Aug 03 '20 at 23:47
@Daya Do you want a math-y explanation? Do you seek intuition? Is there a particular area, like classification with trees, where you are using GI that would make a good example? — dimitriy, Aug 04 '20 at 00:06
@DimitriyV.Masterov , Yes i want to know math explanation for that equation , it would be great helpul if you can explain or provide some link where i can get my answet — Daya, Aug 04 '20 at 01:47
I think Tim's link has good intuition. There are some older, mathier questions that you can discover by searching this site, such as this [one](https://stats.stackexchange.com/q/308885/7071). — dimitriy, Aug 04 '20 at 03:20
https://stats.stackexchange.com/questions/473702/why-is-absolute-loss-not-a-proper-scoring-rule — kjetil b halvorsen, Aug 04 '20 at 22:06

score 0 · Answer 1 · edited Jul 27 '21 at 08:12

The above answers are all excellent. When I had the same question, I managed to get an intuition of this by simply doing the following

temp= []
for j in range(0, 10):
    i = j / 10.0
    num1 = i * i
    num2 = (1-i) * (1-i)
    temp.append(num1 + num2)
print(temp)

temp = [1.0, 0.8200000000000001, 0.6800000000000002, 0.58, 0.52, 0.5, 0.52, 0.58, 0.6800000000000002, 0.8200000000000001]

As you can see, the sum of squares minimizes when at least one of the probabilities goes towards extreme values (0 and 1 being extremes). In Gini impurity, that is what we want - we want to split the node which results in the probabilities of 2 classes being extreme. i.e. one split should have only members of class A and another split members of class B (if this was a 2-class problem).
As you can see form the above, that is achieved when you maximize the sum of squares of probabilities.

Why we use squared probabilities in the gini impurity

1 Answers1