Questions tagged [under-sampling]
12 questions
1
vote
0 answers
Undersampling of datasets and training the model using early stopping
I need some clarification on the undersampling of datasets.
I have 3 datasets. Undersampled train data, undersampled validation data, and test dataset which is not undersampled and is the true representation of the population. My questions are:
I…

RH1
- 21
- 1
1
vote
1 answer
How R randomforest sampsize works?
I am working on a predictive model (imbalanced data) and trying to undersample the majority class data. I wanted to get the representative sample of my majority class and somehow came to know about R's RandomForest which has a parameter "sampsize".…

Amarpreet Singh
- 505
- 4
- 15
0
votes
0 answers
Threshold / Ratio to consider undersampling / oversampling
I have a classification task (predicting DNA methylation) with a somewhat unbalanced dataset - 38% of values are in the minority class, and the other 62% in the majority class.
I have read that one way to work with unbalanced data is to do…

charelf
- 171
- 4
0
votes
0 answers
under sampling a multi-label dataset
I have a multi-label dataset, whose label distribution looks something like this, with label on x-axis and number of rows it occurs in the dataset in y-axis.
## imports
import numpy as np
import pandas as pd
%matplotlib inline
from sklearn.datasets…

Naveen Reddy Marthala
- 207
- 2
- 10
0
votes
1 answer
Should models built using under-sampled data be evaluated against the population
I have a dataset of 11 mil. rows with a 1:10 ratio between minority and majority classes.
To train a model, I have selected all the minority class members and 1/3 of the majority class.
The ratio is now 3:10 and the sample data is comprised of…

onejerlo
- 1
- 1
0
votes
1 answer
Coefficient estimates of logistic regression after downsampling majority class
I used a binary logit model with a lasso regularization term to predict an unbalanced dataset, where I used undersampling on the minority class (2% of observations) to get a 50/50 split of the classes.
Now I want to estimate the model coefficients,…
0
votes
0 answers
Best (quality/time) undersampling technique
I am working on a very unbilanced dataset (90% to 10%) with around 350.000 records, and am trying various classification methods. I bagan with SMOTE, which was quite fast, improved performance on tree classifiers (CART) but made it worse with all…

Mauro
- 11
- 3
0
votes
1 answer
Unbalanced dataset classification problem
I have a binary classification problem and I'm working with an unbalanced dataset. The count for each class in the training set looks like:
Training set:
Class 0: 29 cases
Class 1: 6246 cases
Test set:
Class 0: 2678 cases
Class 1: 12 cases
I…

notarealgreal
- 101
- 1
0
votes
1 answer
What are some "not so common" methods for dealing with unbalanced data?
When we talk about unbalanced data, we usually think about SMOTE, resampling and so on. Usually the methods mentioned here: https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets.
What are others methods you've seem that are not…

Dumb ML
- 197
- 6
0
votes
0 answers
What's a range of good F1 scores?
I have watched a lot of videos on machine learning and in terms of F1 scores, all are different. One video says that an F1 score of .8 is bad, but another says an F1 score of .4 is excellent. What's up with this?
I ran my model with Random Forest…
0
votes
0 answers
Limits of oversampling
I have a dataset with an event rate of less than 0.3 percent. To improve the modeling results, I did some oversampling using SMOTE.
I initially oversampled so that the event rate increases 10 times to 3 percent. But that doesn't feel right. Are…

Clock Slave
- 787
- 7
- 21
0
votes
0 answers
What causes the high OOB-error for randomForest() in R?
I'm trying to perform a random forest in R on a dataset with 16364 observations (after undersampling), using the function randomForest().
But my results look really weird:
What could have caused this?
My data was very unbalanced at first, why I…

AnnieFrannie
- 139
- 9