So this seems like a simple question, but I cant find a way to solve it or formulate a solution that makes sense.
My case is that I have an algo that detects fuel theft (ft_calc). Now I want to compare with a ground truth (ft_gt) built using visual/manual look at the data.
A simple way to compute error would be to take the percentage of error wrt gt i.e. error_perc= |ft_cal - ft_gt| / ft_gt
However, and this is where it gets interesting, is that we see fuel_theft values in both 400 liters and 14 liters.
Now when the ft_calc is 390, a 10 liter difference gives an error of .025 wrt gt of 400, but a ft_calc of 4 and a 10 liter difference gives an error of 74 % wrt gt of 14.
Now this is perfectly normal to compute accuracy, but you can understand that the algo being evaluated is being unfairly targeted when lower values of theft happen (where the chance and impact of noise can be misleading). So can there be some metric that normalizes for the range on which the actual detection is from? I cant come up with a sensible metric.
Another problem, in the same fuel_theft space, is when either case reports no fuel theft and the other does.
So e.g., when ft_gt = 0, but ft_calc= 5lt, what is the error ? conversely when ft_gt=5l, but ft_calc=0, the error is 0-5/5 =100%, which again in unfairly biasing the accuracy of the algorithm.
Please suggest how to attack this problem; currently trying to separate into different regions of fuel_theft (small, medium, large) and handling them thus.