Smote algorithm

Question

When our dataset has 5 or more attributes, what will be the method of producing a new sample with Smote algorithm? How will the Euclidean distance with 5 or more attributes be calculated?

Why is 5 or more attributed (variables) an issue? Euclidean distance can be computed for arbitrary dimensions. — user2974951, Jan 17 '22 at 09:38

score 1 · Answer 1 · answered Jan 17 '22 at 15:27

1

Euclidean distance $d$ between vectors $x,y\in\mathbb R^n$ is:

$$ d(x,y)= \sqrt{ \sum_{i=1}^n \bigg( x_i-y_i \bigg)^2 } $$

If the dimension is $2$, that’s the formula. If the dimension is $5$, that’s the formula. If the dimension is $1234567890987654321$, that’s the formula.

However, SMOTE tends to be portrayed as a solution to something that isn’t such a problem.

answered Jan 17 '22 at 15:27

Dave

28,473
4
52
104

2

I suspect SMOTE is useful for classification techniques where cost-sensitive learning is not readily available and where there is no direct means of controlling over-fitting, which I suspect was true for Naive Bayes, individual decision trees and RIPPER implementations at the time the paper was written. I don't think it ought to be used with modern methods, like the SVM, which supports both cost-sensitive learning and regularization (to avoid over-fitting). – Dikran Marsupial Jan 17 '22 at 15:32

Smote algorithm

1 Answers1