start by looking at the paper from Shi et al. [1]. It gives a good overview of how to build a NN for text recognition.
Code is open source - look for crnn on github (it uses Torch as a NN framework). You can also have a look at the article I wrote about text recognition using TensorFlow [4].
For my handwritten text recognition system I started by implementing the model described by Shi (I use TensorFlow). It works and gives good results.
I extended the model by replacing some parts by components which give better results (e.g. multi dimensional LSTM [2], CTC with language model [3], ...).
But as you already mentioned, CNN+RNN+CTC is basically the way to. A NN framework like TensorFlow has all those basic components included.
Just be aware to take a dataset with enough samples to get good results on the test set. I'm not familiar with OCR datasets, but for a similar problem, namely handwritten text recognition, I recommend the IAM dataset.
[1] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
[2] Graves - Multi-Dimensional Recurrent Neural Networks
[3] Hwang - Character-Level Incremental Speech Recognition with Recurrent Neural Networks
[4] Build a Handwritten Text Recognition System using TensorFlow - https://medium.com/@harald_scheidl/2326a3487cd5