Following scenario:
- I trained an object detector (Viola Jones) on images of 8 subjects and evaluated on three other subjects (the test set) and achieved recall and precision values of roughly 82% and 80% respectively.
- I trained Viola Jones on images of 14 subjects (including the above mentioned 8) and evaluated on the same test set and achieved recall and precision of 80% and 78%.
This was of great surprise to me, since I assumed the test set error should decrease with the increase of training set size. What could be possible reasons for this behaviour?
I can only think of:
- Chance. But what exactly does that tell us about the method (Viola Jones) used?
- The additional data is noisy. Possible, but unlikely...
PS: I also evaluated all subjects in scenario 1 and scenario 2 in a cross-validation manner, where I always left one subject out as test. Here, the performance drop in scenario 2 was even slightly more significant.