Two common techniques for handling unbalanced classes on a binary classification problem are:
under/over-sampling: remove elements from the majority class/add elements to the minority class (from some distribution)
class weighting: use a metric that penalizes mistakes on the minority class more than on the majority class
My question is: are 1 and 2 formally equivalent under some assumptions on how we undersample/oversample?
For concreteness, consider the logistic regression, a dataset of highly unbalanced classes, and binary cross entropy and respective weighted version as metrics. Consider 2 different setups:
draw N samples from a distribution P; concatenate these samples to the training set; train the model; predict on the test set
use $(a_0,a_1)$ as weights to the weighted cross entropy; train on the original training set; predict on the test set
Is there a relationship between $(N,P)$ and $(a_0,a_1)$ such that the two setups would converge to the same predictions as the size of the dataset goes to infinity?
My intuition is that over-sampling the least common classes by including new samples from its empirical distribution sounds similar to change the weights of the class. However, it is not obvious that they are formally equivalent.
Over-sampling can change the posterior of the model because the model is trained on a balanced dataset. However, it is far from obvious that weighting is not doing the same in a less trivial way.