Suppose we want to build a binary classifier with weighted loss, i.e., it penalize different types of errors (false positive and false negative) differently. At the same time, the software we are using does not support a weighted loss.
Can I hack it by manipulating my data?
For example, suppose we are doing some fraud detection problem (let's assume the prior is 50% to 50% fraud vs. normal here, although most fraud detection are extremely imbalanced), where we can afford some false positives (false alerts on normal transactions), but really want to avoid false negatives (missed detection on fraud transactions).
Let's say we want the loss ratio to be 1:5 (false positive : false negative), can we make 5 copies of my fraud transactions?
Intuitively, by doing such copy we changed the prior distribution, and the model would more likely to say a transaction is a fraud one. So the false negative will be reduced.
My guess is if we are truly minimize 0-1 loss, this can do the trick, but if we are minimizing a proxy/logistic/hinge loss (see this post), then this hack will not work well.
Any formal/mathematical explanations?