0

Why we are using squared probabilities instead of normal probabilities in gini impurity . Probabilities will always be positive , so why to square those ?
Any leads would be highly apriciated , Thanks in Advance.........

Daya
  • 1
  • Gini impurity [does not](https://victorzhou.com/blog/gini-impurity/#recap) use squared probabilities, so it is unclear what you mean? – Tim Aug 03 '20 at 23:32
  • 3
    @Tim The Gini Impurity formula that you linked to can be rewritten as $1-\Sigma_i^C p(i)^2$, which does use them. – dimitriy Aug 03 '20 at 23:47
  • @Daya Do you want a math-y explanation? Do you seek intuition? Is there a particular area, like classification with trees, where you are using GI that would make a good example? – dimitriy Aug 04 '20 at 00:06
  • @DimitriyV.Masterov , Yes i want to know math explanation for that equation , it would be great helpul if you can explain or provide some link where i can get my answet – Daya Aug 04 '20 at 01:47
  • I think Tim's link has good intuition. There are some older, mathier questions that you can discover by searching this site, such as this [one](https://stats.stackexchange.com/q/308885/7071). – dimitriy Aug 04 '20 at 03:20
  • https://stats.stackexchange.com/questions/473702/why-is-absolute-loss-not-a-proper-scoring-rule – kjetil b halvorsen Aug 04 '20 at 22:06

1 Answers1

0

The above answers are all excellent. When I had the same question, I managed to get an intuition of this by simply doing the following

temp= []
for j in range(0, 10):
    i = j / 10.0
    num1 = i * i
    num2 = (1-i) * (1-i)
    temp.append(num1 + num2)
print(temp)

temp = [1.0, 0.8200000000000001, 0.6800000000000002, 0.58, 0.52, 0.5, 0.52, 0.58, 0.6800000000000002, 0.8200000000000001] 

As you can see, the sum of squares minimizes when at least one of the probabilities goes towards extreme values (0 and 1 being extremes). In Gini impurity, that is what we want - we want to split the node which results in the probabilities of 2 classes being extreme. i.e. one split should have only members of class A and another split members of class B (if this was a 2-class problem).
As you can see form the above, that is achieved when you maximize the sum of squares of probabilities.

Dave2e
  • 1,441
  • 4
  • 14
  • 18