Incorporating Prior Class Probability Distribution in Logistic Regression

Question

I am amazed that I can not find any articles / lectures about how one can incorporate Prior Class Probability Distributions in classifiers like Logistic Regression or Random Forest.

So my question is:

How can incorporate Prior Class Probability Distribution in Logistic Regression or Random Forests?

Does Incorporating Prior Class Probability Distribution imply that I should use Bayesian machinery?

I am facing a classification task where I know that class a is much more likely than class b.

An adhoc solution would be to just include more samples for class a in the training set, but are there any theoretical results on this?

One thing I thought about was to change the decision threshold from 0.5 to a value taking into account this prior imbalance. But I am not even sure if that makes theoretically sense, because at the point where I am ready to make a decision I already looked at all the feature values so I should not care about the prior probability but the class conditional probability.

dsaxton · Answer 1 · 2016-03-03T14:53:09.880

Let $Y$ be the binary response variable and $X$ the vector of predictors with density $f$ (which would either be continuous, discrete or a combination of both). Note that

$$ \frac{P(Y = 1 \mid X = x)}{P(Y = 0 \mid X = x)} = \frac{P(Y = 1) f_{X \mid Y=1}(x)}{P(Y = 0) f_{X \mid Y=0}(x)} $$

and so

$$ \log \left ( \frac{P(Y = 1 \mid X = x)}{P(Y = 0 \mid X = x)} \right ) = \log \left ( \frac{P(Y = 1)}{P(Y = 0)} \right ) + \log \left ( \frac{f_{X \mid Y=1}(x)}{f_{X \mid Y=0}(x)} \right ) . $$

This means that under a logistic regression model the logarithm of the prior odds of the event $\{ Y = 1 \}$ appears as an additive constant in the conditional log odds. What you might consider then is an intercept adjustment where you subtract off the logit of the empirical odds and add the logit of the prior odds. But, assuming that the prior probability is accurate this doesn't expect to have much of an effect on the model. This type of adjustment is made primarily after some sampling procedure that artificially alters the proportion of events in the data.

score 3 · Answer 2 · edited Apr 13 '17 at 12:44

For random forest, the default prior is the empirical class distribution of training set. You would like to adjust this prior, when you expect the training set class distribution is far from matching new test observations. The prior can be adjusted by stratification/downsampling or class_weights.

Stratifictaion/downsampling does not mean, that some observations are being discarded, they'll just be bootstrapped into fewer root nodes.

Besides adjusting the prior, it is also possible to obtain probabilistic predictions from the random forest model and choose a threshold of certainty.

In practice, I find a mix of adjusting priors by stratification and choosing best threshold as the best performing solution. Use ROC plots to decide for thresholds. Adjusting class_weights will likely provide a similar performance, but it is less transparent, what the effective prior becomes. For stratification, the ratio of stratification is simply the new prior.

Incorporating Prior Class Probability Distribution in Logistic Regression

2 Answers2

Linked