Consider: dataset defined as
n
datapoints x_i
in m
-dimensional space. And there is a label y_i
defining one of the classes belonging to x_i
. There are let's say 5 classes 1,2,3,4,5 (and there is total order among the classes, i.e. 1<2<3<4<5).
What I want to do is to analyse the sensitivity of the algorithm to noise in the dataset. It means that I will sequentially add more noise to the dataset and check how good the classifier will be when learned on the noisy data.
The question: What is the proper way of adding (generating) the noise?
My personal guess is that I will need to normalize the values and somehow add noise based on gaussian distribution. But I am not sure about the particular proper way. I don't want to make any statistical or mathematical mistake.