What NN architecture to use for documents OCR?

Question

I recently go interested in document OCR and would like to gather some opinions on what NN to use. I wonder if there are any proven examples that I can exploit?

I have heard of CNN+LSTM+CTC is good as an end-to-end model, but it's not easy to implement using Keras (I'm very new to the world of AI), so I wonder if there any easier models to use or even models that are available on the get-go?

Harry · Accepted Answer · 2018-06-18T12:42:52.747

start by looking at the paper from Shi et al. [1]. It gives a good overview of how to build a NN for text recognition. Code is open source - look for crnn on github (it uses Torch as a NN framework). You can also have a look at the article I wrote about text recognition using TensorFlow [4].

For my handwritten text recognition system I started by implementing the model described by Shi (I use TensorFlow). It works and gives good results. I extended the model by replacing some parts by components which give better results (e.g. multi dimensional LSTM [2], CTC with language model [3], ...). But as you already mentioned, CNN+RNN+CTC is basically the way to. A NN framework like TensorFlow has all those basic components included.

Just be aware to take a dataset with enough samples to get good results on the test set. I'm not familiar with OCR datasets, but for a similar problem, namely handwritten text recognition, I recommend the IAM dataset.

[1] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

[2] Graves - Multi-Dimensional Recurrent Neural Networks

[3] Hwang - Character-Level Incremental Speech Recognition with Recurrent Neural Networks

[4] Build a Handwritten Text Recognition System using TensorFlow - https://medium.com/@harald_scheidl/2326a3487cd5

What NN architecture to use for documents OCR?

1 Answers1