Incorrect predictions on extracted images from text

Question

I trained a model in PyTorch on the EMNIST data set - and got about 85% accuracy on the test set. Now, I have an image of handwritten text from which I have extracted individual letters, but I'm getting very poor accuracy on the images that I have extracted.

One hot mappings that I'm using -

letters_EMNIST = {0: '0', 1: '1', 2: '2', 3: '3', 4: '4', 5: '5', 6: '6', 7: '7', 8: '8', 9: '9',
         10: 'A', 11: 'B', 12: 'C', 13: 'D', 14: 'E', 15: 'F', 16: 'G', 17: 'H', 18: 'I', 19: 'J',
         20: 'K', 21: 'L', 22: 'M', 23: 'N', 24: 'O', 25: 'P', 26: 'Q', 27: 'R', 28: 'S', 29: 'T',
         30: 'U', 31: 'V', 32: 'W', 33: 'X', 34: 'Y', 35: 'Z', 36: 'a', 37: 'b', 38: 'd', 39: 'e',
         40: 'f', 41: 'g', 42: 'h', 43: 'n', 44: 'q', 45: 'r', 46: 't'}

For reference, this is an example of the image used for testing data -

And this is an example of the image I extracted -

How can I debug this?

Comparing the two images that you provide, it seems that they are dramatically different. Do I understand correctly that your training data and testing data came into being from two different data generation methods; for example, perhaps you used EMNIST for training and then created your test data by writing handwritten digits yourself and digitizing them and then gathering predictions using your model? — Sycorax, Jan 07 '20 at 15:15
Why do you say they are dramatically different? The first image is a '0' and the second image is a 'D', I only provided the images to show qualitatively the kind of images I have extracted from text compared to the actual testing / training images. I split the original EMNIST into Testing and training, on which I got about 85% accuracy on both. After that I created my own handwritten letters and digits, to test how the model fairs on it. — Aditya Das, Jan 07 '20 at 15:18
I say that they're dramatically different because the second image has a much thinner line width. (Also, if the second image is a D, then it appears to be rotated 90 degrees because D has a straight line on the left, not the top.) My point is that training a model on one data generating process and then applying the model to a different data generating process is very challenging because the features your model learns in one context might not be important or present in the second context, so it's not surprising that your model does worse when it's applied to different data. — Sycorax, Jan 07 '20 at 15:26

score 1 · Accepted Answer · answered Jan 08 '20 at 15:58

It's generally true that training a model on one data generating process and then applying the model to a different data generating process is very challenging because the features your model learns in one context might not be important or present in the second context, so it's not surprising that your model does worse when it's applied to different data.

In this case, it seems like the images from EMNIST are rather different from the images using for testing, because the EMNIST digits have thicker lines and have more empty space between the edge of the digit and the edge of the image. In other words, the test images are tightly cropped while the training images aren't.

Incorrect predictions on extracted images from text

1 Answers1