Premise
I saw an interesting example of a machine learning logistic classifier for modeling/predicting sentiment for customer reviews. One of the first things in the example was a note on probabilities. Namely, how many reviews were thought to be positive (user liked the product). It was denoted as the following math notation, which is followed by a quote:
$P(y=1)=.7$
Degrees of belief: "I expect 70% of rows to have y=1 (exact number will vary for each specific dataset, as each dataset would be a different subset of the underlying population)
When I think about that, it seems to makes sense at surface level, the math concerning probabilities/conditional probabilities aren't too overbearing, but my intuition hasn't been able to reconcile exactly why this information is important. Assuming two outcomes: y=0
and y=1
, P(y=1)=.7
would mean 70% of reviews are positive and 30% are negative. If we have a large enough sample size, we could expect the population values to be in the neighborhood of this. Maybe it's .7
or .5
; I don't see how it changes how we estimate our coefficients for the model. To me, it just seems like a fun-fact, like I said, I don't see how I'm supposed to use this information during implementation.
The only thing I could think of is that if P(y=1)=.0001
or some really small number, that would mean we have a noticeable case of class imbalance and that could affect classification accuracy (some models are more robust to class imbalance than others). While this is an important consideration, and would probably merit finding P(y=1)
, I still can't help but wonder is there anything else...
Question
How can we best utilize the knowledge of P(y=1)
in classification?
Further Clarification:
Answers may expand on P(y=1)
as it relates to any of the following:
- heuristic for model accuracy
- weights for model
- model selection