Optical Character Recognition (OCR) for Old Torah Manuscripts

In this work, two architectures of Optical Character Recognition (OCR) systems were demonstrated for the Solitreo font of the Hebrew Language. The first architecture demonstrated was based upon text detection and classification. The second architecture demonstrated was based upon cropping a document into separate text rows, and whole row translation using an LSTM network. In addition, handwritten text document processing algorithms were also demonstrated, such as: Binarization, Connected Components Analysis, Text Row Detections, and more. Some of these algorithms aid in reducing input dimensionality and therefore assist in achieving improved results, some of these algorithms are crucial for intermediate handwritten text processing tasks. In the full report we also presents the reasons for these algorithms’ usage, pros and cons, performance upon the Solitreo font of the Hebrew language, and performance comparisons.

Optical Character Recognition (OCR) for Old Torah Manuscripts