0

A project with about 14000 csv files (about 12000 class 0 and 2000 for class 1 for each csv contain 365 columns and 3330 rows (value are either 0 or 1 )

1.is there any sample code for this kind of data? (mine get worse result...)

2.any way to solve imbalance problem? (I've search SMOTE however.....nothing for SMOTE this kind of data form, how to SMOTE with a batch of csv?)

3.not sure how to build a model fit this kind data , the model I had built all get worse result... hope to get suggestion

Thanks!!

Chevady Ju
  • 11
  • 2
  • Good news! Class imbalance is not a problem: https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he. – Dave Feb 26 '21 at 11:07
  • @Dave thanks for sharing but I know them , the thing is how to do I'm very confused with this – Chevady Ju Feb 26 '21 at 11:26
  • “but I know them” You know what? If you already know what Kolassa posted about class imbalance, then what’s up with your question #2? // “I’m very confused with this” You’re confused with what? – Dave Feb 26 '21 at 11:50
  • please read my question again , i'm asking how to do(method & sample code) not what is imbalance dataset – Chevady Ju Feb 26 '21 at 11:58
  • Of course you know what an imbalanced data set is. Do you understand the link about why it isn’t a problem? – Dave Feb 26 '21 at 12:04
  • 1. if it's not a problem then why methods like SMOTE, ADASYN exist? 2. if it's not a problem for imbalance then what shall you do for this situation? (I can't get more data , also model adjust try's alot already) 3.according to my output (loss,acc,val_loss,val_acc, ROC) I had already search info that it's because imbalanced how do you prove it's not – Chevady Ju Feb 26 '21 at 12:41
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/120195/discussion-between-dave-and-chevady-ju). – Dave Feb 26 '21 at 13:20

0 Answers0