Is under-sampling formally equivalent to class re-weighting (unbalanced binary classification)?

Question

Two common techniques for handling unbalanced classes on a binary classification problem are:

under/over-sampling: remove elements from the majority class/add elements to the minority class (from some distribution)
class weighting: use a metric that penalizes mistakes on the minority class more than on the majority class

My question is: are 1 and 2 formally equivalent under some assumptions on how we undersample/oversample?

For concreteness, consider the logistic regression, a dataset of highly unbalanced classes, and binary cross entropy and respective weighted version as metrics. Consider 2 different setups:

draw N samples from a distribution P; concatenate these samples to the training set; train the model; predict on the test set
use $(a_0,a_1)$ as weights to the weighted cross entropy; train on the original training set; predict on the test set

Is there a relationship between $(N,P)$ and $(a_0,a_1)$ such that the two setups would converge to the same predictions as the size of the dataset goes to infinity?

My intuition is that over-sampling the least common classes by including new samples from its empirical distribution sounds similar to change the weights of the class. However, it is not obvious that they are formally equivalent.

Over-sampling can change the posterior of the model because the model is trained on a balanced dataset. However, it is far from obvious that weighting is not doing the same in a less trivial way.

Please be more specific what you mean with weighed cross entropy. Are you asking if it is equivalent only with regard to this specific metric? It is trivial to show it's not equivalent with regard to other common metrics. — David Ernst, Sep 11 '17 at 14:48
Added link to tensorflow, where I saw it first described. If you can provide a general argument of why they are not equivalent the better. The ones in the question was just to avoid being too broad. — Jorge Leitao, Sep 11 '17 at 15:47

Is under-sampling formally equivalent to class re-weighting (unbalanced binary classification)?

0 Answers0