1

I am testing the accuracy of a machine learning approach that counts cars in images. I have both a predicted dataset and a "real" dataset that was generated by a human. For example, this is what my data looks like:

image   real_count  predicted_count
A       6           6
B       5           6
C       0           1
D       7           6
E       6           6
F       9           11
G       1           1

I am trying to assess how well the predicted data holds up against the real data. Is it appropriate to use a confusion matrix and the associated measures of agreement such as kappa to assess the accuracy of the predicted data? Is there a more suitable measure of accuracy for this type of frequency data?

Aaron
  • 135
  • 1
  • 9

1 Answers1

1

I would not use a confusion matrix in your case.

A confusion matrix only measures whether your prediction is perfect or not: for all images with 9 cars, how often did we predict 9, and how often something else? But if we didn't predict 9, then it makes a difference whether we predicted 8 or 3. A prediction of 8 is wrong, but it is much closer to 9 than a prediction of 3.

Instead, I would recommend that you use standard point forecast accuracy measurements for numerical prediction, like the (R)MSE. In deciding between the (R)MSE and, say, the MAE or MAPE, I very much recommend Why use a certain measure of forecast error (e.g. MAD) as opposed to another (e.g. MSE)?

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357