One way would be to create a simple signal-to-noise estimate, using the pixel intensity and the ground truth image as a mask. Here's an example of what I mean in Python:
import numpy as np
# Sample data
# Indices [0, 1, 2, 6] are background, [3, 4, 5] are object
original = np.array([1, 2, 2, 5, 6, 8, 2])
truth = np.array([0, 0, 0, 1, 1, 1, 0])
back_mask = 1 - truth
result = np.array([0, 1, 0, 4, 5, 7, 1]) # Result of background subtraction
def non_masked_mean(input_array, mask):
"""mean of non-masked elements in the array"""
return np.ma.masked_array(input_array, mask).mean()
def snr(input_array, back_mask):
"""ratio of non-masked to masked mean intensities"""
with np.errstate(all='ignore'):
snr = non_masked_mean(input_array, back_mask) / non_masked_mean(input_array, (1-back_mask))
return 0 if np.isnan(snr) else snr
snr_before = snr(original, back_mask) # (mean of objects / mean of background), before
snr_after = snr(result, back_mask) # (mean of objects / mean of background), after
snr_ratio = snr_after / snr_before
if snr_after == np.inf:
print "Perfect SNR in result! Either great or suspect..."
elif snr_after == 0:
print "Image was flattened :("
else:
print "SNR changed by a multiple of %.2f" % (snr_ratio)
The problem with this is that you get a perfect score by favouring a heavy-handed background subtraction. Even if only 1 pixel of the object survives this SNR will be infinite. Another approach would be to consider the background and the signal separately, and weight the results of the two. Then you can decide if it's more important to flatten the background, or preserve the original.
def non_masked_mean_ratio(input_array, result_array, mask):
"""ratio of two array means, with the same mask"""
return non_masked_mean(result_array, mask) / non_masked_mean(input_array, mask)
signal_ratio = non_masked_mean_ratio(original, result, back_mask) # (mean of objects after) / (mean of objects before)
background_ratio = non_masked_mean_ratio(original, result, truth) # (mean of background after) / (mean of background before)
weight = 0.5 # Higher weight favours preservation of objects
score = (weight*signal_ratio) + ((1-weight) * (1-background_ratio))
This gives the score a nice expected range of about 0-1, where 1 is the ideal result. It's not perfect though, as you can boost it by increasing the intensity of the objects.
If you're expecting a binary result, you can use a similar strategy except using the number of non-zero pixels instead of the mean:
def nonzero_count_ratio(result_array, ideal_array):
"""ratio of actual number of non-zero pixels vs ideal non-zero pixels"""
return float(np.count_nonzero(result_array*ideal_array)) / np.count_nonzero(ideal_array)
true_obj_ratio = nonzero_count_ratio(result, truth) # Number of correct object pixels vs number of potential correct
false_back_ratio = nonzero_count_ratio(result, back_mask) # Number of false background pixels vs number of potential false
weight = 0.5
score = (weight*true_obj_ratio) + ((1-weight) * (1-false_back_ratio))