4

I trained some CNN model for license plate recognition using stacked LSTM and convolutional layers, but I got stuck in %88 accuracy. (This accuracy is on the whole license plate not one character).

For training my model I used categorical cross entropy. After some search I have found that a good loss function for image OCR is CTC loss, but I don't know exactly it is the right choice or not because of two reasons:

first, each license plate has fixed length of character (8 chars) so they don't have variable length which is the problem CTC loss function solves.

Secondly, the characters are not necessarily correlated like languages or NLP tasks.

Could anyone guide me is CTC right choice or not? if yes why and if no for my task which loss function is suitable for my task.

Panda
  • 225
  • 1
  • 2
  • 11
  • I have exactly the same question about license plate recognition. Did you find a solution to your problem? I observe two cases: *1.* additional characters inserted (often at the beginning or end) and *2.* deletion of the real repetitions of same character. – Hasnat Nov 23 '18 at 14:57
  • @Hasnat Actually, I found LSTM and CNN with categorical cross entropy loss more useful, maybe because I had not the right intuition about CTC loss. – Panda Nov 23 '18 at 17:56
  • how do you finally decode the string/probability vector obtained from the CNN+RNN network? I guess you reduced/fixed the length of the recognition probability vector to exact 8 length, so that you do not need to apply an extra decoder to transform raw strings to the final output string. – Hasnat Nov 23 '18 at 19:13
  • @Hasnat yes, exactly. – Panda Nov 24 '18 at 11:43

1 Answers1

5

The nice property of the CTC loss is that it trains the network in a segmentation-free manner. At the end of training, the network is able to handle these situations (which might occur in your use-case) without manually coding any special cases:

  • beginning of text in images differs from image to image: no problem, empty area will be marked by CTC-blanks
  • one character occupies multiple time-steps: no problem, they will be merged
  • the size of the region of interest differs from image to image: no problem, the characters are repeated as often as needed

So, using CTC loss should be ok for your use-case.

However, you could implement a special CTC decoder which only allows valid strings. License plates do not contain random character strings, so maybe it is possible to describe a license plate text with a regular expression!? Then, you could adapt a CTC decoding algorithm such as beam search decoding (1) to only keep those beams describing valid license plates.

(1) you could use my decoder as a starting point, throw away the language model and instead score the beams by a regular expression: https://github.com/githubharald/CTCDecoder/blob/master/src/BeamSearch.py

Harry
  • 744
  • 7
  • 12
  • is it possible to provide an example how to quickly replace the LM with regular expression in your python script. – Hasnat Nov 26 '18 at 12:13
  • 1
    no, there's no "quick" way I am aware of. You have to translate a regular expression to a DFA. Instead of extending a beam by all possible characters, you have to query the DFA to see which characters are allowed to eventually form a valid beam-text. – Harry Nov 26 '18 at 14:08