0

I have been struggling with this topic for a while, and on this site are multiple answers but none of them answers completely my question. For example in Which class to define as positive in unbalanced classification states that all classification algorithms are symmetrical in regard to which class is called positive and which is called negative, or which is assigned to 0 and which to 1. My question goes a little further:

Does the probability change when you define the label? I understand that algorithms are symmetrical in regard to which class is called positive/negative, but that doesn't mean that the probability is the same if you change the label (or it does?, I have always labeled as 1 the class I'm focused to predict, so I assumed that 0.9 probability meant that it's likely to be 1, and if I labeled as 0 the probability would be 0.1). To be more specific, we have these four Cases: 1A, 1B, 2A, 2B and to avoid misunderstandings we will define them as positive class = buy and negative class= don't buy.

enter image description here

In the table above we have for each case an id, multiple variables in X and an independent variable y. The difference between cases A and B is how the label is defined: for A, buy=1 and don't buy=0 and vice versa for B, for 1 are balanced datasets and 2 unbalanced datasets. Once the label is defined, we run a classification algorithm that gives the Score or 1-Score column.

So, my questions:

  • Q1: I define the positive class as 1 (like in Case 1A), we run some classification algorithm, and we get the Score column. Then, if I define the positive class as 0 (like in Case 1B), we run some classification algorithm, do I get the Score column or the 1-Score column? I always thought that the algorithm gives you Score or Score-1 depending on how you classify (as 0 or 1) the positive/negative class. In other words, I thought the algorithm will always give you the highest probabilities of the class (positive or negative) that you defined as 1. Is this right or the classification algorithm will always give you the same probability independently of how you classify (as 0 or 1) the positive/negative class? (for this question you could use the observations from the table above with id=1 and id=4, as both Score have high probability but they are different classes and for cases A and B they are labeled differently)

  • Q2: ¿Every? algorithm has a loss function to minimize, once the algorithm finishes minimizing we measure the model performance with an evaluation metric. The scores are computed by the loss function, so the probabilities (or the scores columns in the table above) are dependent on the loss function and independent from the evaluation metric. If I'm right with this statement, my question is: If I have unbalanced data, does any loss function consider unbalanced data or the loss function minimize the same way no matter how unbalanced is the data? and going further, does the loss function changes its minimization if we change how we classify the label (as 0 or 1, like case 2A vs 2B) and the data is unbalanced? (I understand we shouldn't use SMOTE to balance the data but rather to use the appropriate statistical tool for unbalanced data as you can see here, here, here or here)

Chris
  • 171
  • 1
  • 8
  • For Q1, what about my answer in the linked question does not address your concern? – Dave Jul 14 '21 at 17:23
  • Hi again @Dave! Nop, I did another question (which I bounted) that you answered with that link and I deleted it because the linked question answered it. But the linked question doesn't answer when I talk specifically about probabilities (which was my intention not-so-well-addressed with my first question deleted xD), so I created this one very specific. I'm struggling because if the Score computation doesn't depend on how you classify the classes, then how it actually takes the label values to minimize the loss? – Chris Jul 14 '21 at 17:33
  • $L(y, p) = \sum_i \Big[y_i\log(p_i) + (1-y_i)\log(1-p_i)\Big]$ is the crossentropy loss function, so it considers both labels bu turning off one of the terms and then calculating the log of the amount of probability by which you missed. // Remember that $P(1) = 1-P(0)$ in the binary setting. // Q2 really seems like a separate question that warrants its own post. – Dave Jul 14 '21 at 17:39
  • Thanks @Dave! Do you have some links where I could read in order to understand better how that formula applies to this problem? I'm kind of lost there honestly. And Q2 mixes Q1 with unbalanced data, I thought it could be better here, maybe could you explain why do you think it's better to separate in other post? – Chris Jul 14 '21 at 22:12
  • You have the $y_i$ (truth) quantities; your model gives you a corresponding prediction $p_i$. Crank through the math (or have a computer do it for you like we do these days). – Dave Jul 15 '21 at 00:18

0 Answers0