1

When our dataset has 5 or more attributes, what will be the method of producing a new sample with Smote algorithm? How will the Euclidean distance with 5 or more attributes be calculated?

Pitouille
  • 1,506
  • 3
  • 5
  • 16
user346917
  • 11
  • 1

1 Answers1

1

Euclidean distance $d$ between vectors $x,y\in\mathbb R^n$ is:

$$ d(x,y)= \sqrt{ \sum_{i=1}^n \bigg( x_i-y_i \bigg)^2 } $$

If the dimension is $2$, that’s the formula. If the dimension is $5$, that’s the formula. If the dimension is $1234567890987654321$, that’s the formula.

However, SMOTE tends to be portrayed as a solution to something that isn’t such a problem.

Dave
  • 28,473
  • 4
  • 52
  • 104
  • 2
    I suspect SMOTE is useful for classification techniques where cost-sensitive learning is not readily available and where there is no direct means of controlling over-fitting, which I suspect was true for Naive Bayes, individual decision trees and RIPPER implementations at the time the paper was written. I don't think it ought to be used with modern methods, like the SVM, which supports both cost-sensitive learning and regularization (to avoid over-fitting). – Dikran Marsupial Jan 17 '22 at 15:32