0

I have made a recording of two different sounds and I want to use an SVM in order to be able to distinct between the two. The process I have followed is:

  1. Divided each sound in multiple 20ms frames.

  2. For every frame I calculate the MFCCs, deltas and deltas-deltas (in total 48 coefficients that act as features)

  3. I create a label for each sound (e.g 0 and 1)
  4. Using the cross_validation.train_test_split function from Sklearn, I create my feature_test, feature_train, label_train and label_test datasets.
  5. I train my SVM using the RBF kernel and the train datasets mentioned in the above step.
  6. I make a prediction based on the test datasets of step 4

My question is the following. How do I know that what I get from the score of the SVM is correct? How can I know that my classifier does not overfit? I have used cross-validation as well and I have calculated the ROC curve but I don't know if I should trust the results. I am attaching the image and the results from some metrics.

Metrics

Accuracy of SVM = 1

F1-score of SVM = 1

enter image description here

1 Answers1

0

Well, assuming you did everything the way you intended to (and it's never a bad idea to check), you have to tell us - because you created the labels! I would assume that if you generate the data, you should be able to classify it again.

You should tell us how you labeled the data, and you should spot check a few points if you're suspicious of your results, but the biggest issue here is that you created the question, and the answer.

one_observation
  • 1,500
  • 11
  • 15
  • Sorry for the late reply. So it turns that my results were correct after all. I did new recordings that acted as observations and the prediction of the SVM was the correct one every time. The reason I posted this question was out of fear of over-fitting but it seems that my algorithm works. – user103394 Feb 23 '16 at 15:46