Percentage of positive target values in dataset and the train/test split

Asked Mar 08 '21 at 12:25

Active Mar 08 '21 at 13:06

Viewed 26 times

Suppose that in my dataset of 100 observations, only 25 have a target variable equal to 1, while the other 75 have target variables equal to 0. Should the portion of target values that are positive affect my choice of size for the train/test split? In other words, should the portion $p$ of the 100 observations that are assigned to the training set be a function of the portion $p'$ of the 100 observations that have positive target values?

Edit: the numbers 100, 25, and 75 are chosen for simplicity, but I am more interested in the general case.

edited Mar 08 '21 at 13:06

asked Mar 08 '21 at 12:25

DavidSilverberg

1

No. That would simply be over-/undersampling, [which is not useful, even for unbalanced datasets (which are usually not a problem)](https://stats.stackexchange.com/q/357466/1352). – Stephan Kolassa Mar 08 '21 at 12:55

Percentage of positive target values in dataset and the train/test split

0 Answers0