0

I have highly imbalanced data of 2 classes. For example, 4000 samples where the number of positive class is 20 samples.

My idea is:

Train = 2000 samples (50%: 10 positive samples and 1990 negative samples).

Test = 1000 samples (25%: 5 positive samples and 995 negative samples).

Validate = 1000 samples (25%: 5 positive samples and 995 negative samples).

From my understanding, I should draw sample without replacement for Test data. The rest will be use for preparing Train and Validate described by this diagram.

Diagram of data preparation for Few-shot

For Test data, do I need to sample the data like Train and Validate which divide data into the Support and Query set, or I just create only Query set?

From diagram, whether or not task1, task2, task3 in the red rectangle are the same thing with mini-batch?

May I have your suggestions?

  • 1
    Welcome to Cross Validated! Unless you have a high signal-to-noise ratio (obvious patterns), you simply do not have enough instances of the minority class. The good news, however, is that your class ratio of $199:1$ will not present a problem once you obtain more data, even if your new data feature the same class imbalance. The issue is a lack of observations, [not the ratio itself](https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he). – Dave Dec 20 '21 at 14:59

0 Answers0