I'm testing the robustness of an image processing algorithm to measure certain feature of images. The output of the algorithm is simple: 1 if the feature is found, 0 if the feature is not found. An image may or may not truly contain a feature, the ground truth of which are recorded.
I will apply the algorithm on 10,000 images with and without added noise. Ideally, the algorithm output for the same image should remain identical if the algorithm is robust against the noise.
The resulting algorithm output for 10,000 images looks like
alg on rawImage alg on (rawImage+noise) ground truth
image#1 1 1 1
image#2 0 1 1
image#3 0 0 0
image#4 1 0 0
...
As seen above, the algorithm extracted the feature in some image, but failed sometimes, also the algorithm could extract the wrong feature, which doesn't match with ground truth.
Could any expert suggest a measure of the algorithm robustness or accuracy in my case?
Meanwhile, I would like to compare algorithm performance on images with and without added noise, i.e., the first and second columns of the results as shown above. Apart from confusion matrix, any other statistical measure I could use?