-1

My target has 5 classes. My testing dataset has an accuracy of about 34%. Can I assume this is a reasonable model purely based on classification accuracy, since random guessing is 20%.

Ippei
  • 1
  • 2
  • Typically, standards for "satisfactory performance" are shaped outside statistics. They are determined by the context of your problem and traditions in your research field... Will your classification method make a positive impact in your field? Will the improvement relative to random guessing generate sufficient improvement in welfare? Are you sure that you cannot raise the accuracy (correct classification rate) even further? – stans Aug 18 '18 at 13:32
  • 1
    Also make sure your classes are **balanced** (i.e. have the same number of samples in each class). Imagine one of the five classes has 50% of the samples. By predicting just this class, a dummy classifier could achieve an accuracy of 50%. – Djib2011 Aug 18 '18 at 13:46
  • Very very *very* relevant: [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) – Stephan Kolassa Aug 18 '18 at 18:30

1 Answers1

0

Not necessarily, that assumes that the 5 classes are equally distributed. Consider the case of imbalanced data where 90% of the data is class 1 and the rest of the data is split between classes 2-5. If we were to predict every single observation as being class 1 we could expect accuracy of 90%! This is clearly not a useful model demonstrating the importance of looking at other metrics of performance such as precision, recall, ROC etc. Accuracy can still be considered but the minimum baseline should be performing better than the majority class.

Edit: Just noticed Djib2011 beat me to the same point in the comments.

Seraf Fej
  • 436
  • 2
  • 15