So I have the same issue a few weeks ago.
What you want to do is :
- first clean your dataframe, drop_duplicates etc etc.
- resample the class(es) that has more samples
--> If class A has 85% of y and class B the remaining 15%,
What you can do is resample class A, by doing this you are going to drop samples from class A but you will get a better ratio between A and B.
min_value = df.target.value_counts()[df.target.value_counts() == df.target.value_counts().min()].item()
# Split the classes between two populations, and with this you can resample the one you want
pop1 = new_df[new_df.target == 0].sample(min_value)
pop2 = new_df[new_df.target == 1]
# With this I downsampled pop2 from over 4000 samples to 1500
pop2 = resample(pop2, replace=False, n_samples=1500)
Hopefully this will help you.