Highest Voted 'under-sampling' Questions - Statistical Analysis Stack Exchange

1

vote

0 answers

Undersampling of datasets and training the model using early stopping

I need some clarification on the undersampling of datasets. I have 3 datasets. Undersampled train data, undersampled validation data, and test dataset which is not undersampled and is the true representation of the population. My questions are: I…

sampling under-sampling

asked Dec 24 '21 at 00:20

RH1

21
1

1

vote

1 answer

How R randomforest sampsize works?

I am working on a predictive model (imbalanced data) and trying to undersample the majority class data. I wanted to get the representative sample of my majority class and somehow came to know about R's RandomForest which has a parameter "sampsize".…

r random-forest unbalanced-classes hyperparameter under-sampling

asked Mar 30 '20 at 07:43

Amarpreet Singh

505
4
15

0

votes

0 answers

Threshold / Ratio to consider undersampling / oversampling

I have a classification task (predicting DNA methylation) with a somewhat unbalanced dataset - 38% of values are in the minority class, and the other 62% in the majority class. I have read that one way to work with unbalanced data is to do…

classification oversampling under-sampling

asked Nov 01 '21 at 17:23

charelf

171
4

0

votes

0 answers

under sampling a multi-label dataset

I have a multi-label dataset, whose label distribution looks something like this, with label on x-axis and number of rows it occurs in the dataset in y-axis. ## imports import numpy as np import pandas as pd %matplotlib inline from sklearn.datasets…

classification multi-class multilabel under-sampling

asked Oct 26 '21 at 10:10

Naveen Reddy Marthala

207
2
10

0

votes

1 answer

Should models built using under-sampled data be evaluated against the population

I have a dataset of 11 mil. rows with a 1:10 ratio between minority and majority classes. To train a model, I have selected all the minority class members and 1/3 of the majority class. The ratio is now 3:10 and the sample data is comprised of…

machine-learning cross-validation validation down-sample under-sampling

asked Sep 07 '21 at 04:04

onejerlo

1
1

0

votes

1 answer

Coefficient estimates of logistic regression after downsampling majority class

I used a binary logit model with a lasso regularization term to predict an unbalanced dataset, where I used undersampling on the minority class (2% of observations) to get a 50/50 split of the classes. Now I want to estimate the model coefficients,…

logistic lasso under-sampling

asked May 21 '21 at 21:01

John Locke

1

0

votes

0 answers

Best (quality/time) undersampling technique

I am working on a very unbilanced dataset (90% to 10%) with around 350.000 records, and am trying various classification methods. I bagan with SMOTE, which was quite fast, improved performance on tree classifiers (CART) but made it worse with all…

r time-complexity under-sampling

asked Mar 22 '21 at 17:46

Mauro

11
3

0

votes

1 answer

Unbalanced dataset classification problem

I have a binary classification problem and I'm working with an unbalanced dataset. The count for each class in the training set looks like: Training set: Class 0: 29 cases Class 1: 6246 cases Test set: Class 0: 2678 cases Class 1: 12 cases I…

binary-data unbalanced-classes decision under-sampling

asked Dec 01 '20 at 09:03

notarealgreal

101
1

0

votes

1 answer

What are some "not so common" methods for dealing with unbalanced data?

When we talk about unbalanced data, we usually think about SMOTE, resampling and so on. Usually the methods mentioned here: https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets. What are others methods you've seem that are not…

unbalanced-classes resampling smote under-sampling

asked Aug 04 '20 at 14:23

Dumb ML

197
6

0

votes

0 answers

What's a range of good F1 scores?

I have watched a lot of videos on machine learning and in terms of F1 scores, all are different. One video says that an F1 score of .8 is bad, but another says an F1 score of .4 is excellent. What's up with this? I ran my model with Random Forest…

machine-learning random-forest f1 under-sampling

asked Jul 26 '20 at 03:03

Sriswaroop Koundinya

1
3

0

votes

0 answers

Limits of oversampling

I have a dataset with an event rate of less than 0.3 percent. To improve the modeling results, I did some oversampling using SMOTE. I initially oversampled so that the event rate increases 10 times to 3 percent. But that doesn't feel right. Are…

sampling unbalanced-classes oversampling under-sampling

asked Jul 17 '20 at 18:49

Clock Slave

787
7
21

0

votes

0 answers

What causes the high OOB-error for randomForest() in R?

I'm trying to perform a random forest in R on a dataset with 16364 observations (after undersampling), using the function randomForest(). But my results look really weird: What could have caused this? My data was very unbalanced at first, why I…

r random-forest validation under-sampling

asked May 08 '20 at 12:45

AnnieFrannie

139
9

Questions tagged [under-sampling]