1

I have a dataset which has float values in every feature, and the feature has the data-points not much varying, here is a part of training-set data :

95.08273,94.13686,95.843,95.83886,95.38811,1 94.37234,93.47385,94.54948,94.67984,93.80062,1 94.02294,94.96799,95.075,95.41348,94.93842,1 95.1664,94.84861,94.82346,95.61005,96.62745,0 95.23271,94.87994,95.42258,95.48337,96.3997,0 93.77203,94.3065,94.33946,93.70812,93.42625,0 94.79427,94.70049,94.40502,94.61435,94.92593,1

The last column of each train data is the class-label - either 0 or 1. And I need to predict the same for the similar test data samples.

I can't seem to determine what basis should be there for classifying the test data, since the samples in the training data are very similar.

I have already tried scaling the data by using scikit-learn's preprocessing.scale method, as well as taking the mean of a column, subtracting from each sample (xi), and then dividing by the standard-deviation, but still accuracy is same as before.

How should I normalize data, and which classifier I should use in this case ?

Jarvis
  • 113
  • 4
  • I have updated my question description, the answers there didn't solve my problem, please have a look. @HongOoi – Jarvis Sep 12 '16 at 01:28
  • Although the answers there may not have helped you, you still seem to be asking for essentially the same thing so it is difficult to see how this could generate substantially different answers. You should probably try to make this question more clearly distinct from your previous one. You might be better to ask "why did normalization not help" than "what's a way to normalize, Take Two". – Silverfish Sep 12 '16 at 08:09
  • Have done the same, and the question clearly highlights what was the problem earlier, now can you answer ? @Silverfish – Jarvis Sep 12 '16 at 08:34

0 Answers0