13

Can somebody give me a brief explanation of the differences between those two resampling methods : ROSE and SMOTE ?

Ferdi
  • 4,882
  • 7
  • 42
  • 62
Martin
  • 301
  • 1
  • 2
  • 8

1 Answers1

18

ROSE uses smoothed bootstrapping to draw artificial samples from the feature space neighbourhood around the minority class.

SMOTE draws artificial samples by choosing points that lie on the line connecting the rare observation to one of its nearest neighbors in the feature space.

Source: Training and assessing classification rules with unbalanced data

My experience: I used both techniques to create balanced data, and found SMOTE (from R's DMwR-package) to produce better results. The reason is, in my opinion, that SMOTE doesnt create as much 'unrealistic' values as ROSE. ROSE gave me values that were outright impossible (negative Area sizes or elevation). You can specify the neighbourhood from where ROSE draws its samples, and mitigate these problem to some extent. But SMOTE still produced better training data to predict onto my original (imbalanced) data. Both techniques outperformed over and undersampling though.

Achu Mani
  • 318
  • 3
  • 6
  • 1
    Hi, sorry for bringing up old thread, but in ROSE how do you optimize the dispersion? The [paper](https://link.springer.com/content/pdf/10.1007/s10618-012-0295-5.pdf) seems to minimize the AMISE, but they did not have any detail that I can implement in my code. – Darren Christopher Dec 03 '21 at 02:12