I learned today that a staff member at my company deliberately biased a sample set. They selected items for the sample known to be different, more positive, than the population as a whole.
I am now trying to understand the best way of dealing with it while minimizing rework.
Unfortunately, the rework for any of the items in the test set is expensive and time consuming.
We are testing computerized object recognition in complex environments and we are scoring the computerized system and comparing it to human categorized images. It takes a human 1/2 day to score a single sample item for this test.
So, I'd like to keep as many of the items already scored as part of my new sample set as possible.
I would like advice on the best way to do this.
My thought so far is to think of the situation as a stratified sample with some parts of the sample set (strata?) already completed while other strata are yet to be completed. So, I now need to randomly select other items which fit the definition of the incomplete strata.
At any rate, please share your thoughts and advice. Thanks in advance!