I have a dataset which has float values in every feature, and the feature has the data-points not much varying, here is a part of training-set
data :
95.08273,94.13686,95.843,95.83886,95.38811,1
94.37234,93.47385,94.54948,94.67984,93.80062,1
94.02294,94.96799,95.075,95.41348,94.93842,1
95.1664,94.84861,94.82346,95.61005,96.62745,0
95.23271,94.87994,95.42258,95.48337,96.3997,0
93.77203,94.3065,94.33946,93.70812,93.42625,0
94.79427,94.70049,94.40502,94.61435,94.92593,1
The last column of each train data is the class-label
- either 0
or 1
. And I need to predict the same for the similar test data samples.
I can't seem to determine what basis should be there for classifying the test data, since the samples in the training data are very similar.
I have already tried scaling the data by using scikit-learn
's preprocessing.scale
method, as well as taking the mean of a column, subtracting from each sample (xi
), and then dividing by the standard-deviation
, but still accuracy is same as before.
How should I normalize data, and which classifier
I should use in this case ?