Surya: Revolutionizing Multilingual Document OCR with Line-Level Detection

AI News

Surya: Revolutionizing Multilingual Document OCR with Line-Level Detection

Jimmy W.

January 16, 2024

Surya: Revolutionizing Multilingual Document OCR with Line-Level Detection

Meet Surya: A Multilingual Text Line Detection AI Model for Documents

In a recent tweet from the founder of Dataquest.io, Vik Paruchuri recently publicized the launch of a multilingual document OCR toolkit, Surya. The framework can efficiently detect line-level bboxes and column breaks in documents, scanned images, or presentations.

Surya’s encoder-decoder model uses an image of the document as input and produces an image with boxes drawn around the line boxes on the original input image. The initial layers of the decoder contain SegFormer, a transformer for semantic segmentation, while the 2d convolutional layer with batch-normalization layers makes the end of the decoder network. Before using the image or PDF, the pages are split into segments to the maximum dimension of the image and undergo various pre-processing.

For model evaluation for the accuracy of bboxes, researchers used precision and recall on the coverage area instead of the traditional IoU metric (Intersection over union). The precision calculates how well predicted bboxes cover ground truth bboxes and recall calculates how well ground truth bboxes cover predicted bboxes. Surya is compared with Tesseract, experiments suggested that the precision of Surya is much higher than that of Tesseract, and Tesseract’s recall is slightly more than that of Surya but overall Surya outperforms Tesseract. Another advantage of Surya over the Tesseract model is that it can work both on CPU and GPU and is much faster than Tesseract.

Named after the Hindu God of the Sun, Surya has successfully worked on multiple languages and is expected to work on almost all languages. The limitation of this model is not likely to work on photos or other images as it is specialized on documents. Experiments also show it does not work well with images that look like ads. In spite of this limitation, the model is still of great use and can be further expanded to text detection, table, and chart detection.

Source link

LEAVE A REPLY Cancel reply