Asegmentation based optical character recognition system for Bangla printed text
Mahir Mahbub, Ahmedul Kabir
Abstract
Bangla ranks as the fifth most spoken language globally, catalyzing significant interest in the development of Bangla optical character recognition (OCR) sys tems. The intricate structure of the Bangla script, including compound char acters, modifiers, and headlines, complicates the formation of words. This research introduces a complete OCR system pipeline for printed Bangla text. It employs a thinning-based segmentation approach combined with a convolu tional neural network (CNN) to recognize Bangla fonts. Additionally, a part of speech (POS)-aware spell checker is proposed that automatically corrects mis spelled words while considering their context within the sentence. We intro duce semi-generalized filters that adapt to new fonts, addressing conjunct for mation challenges in Bangla OCR. This flexible design allows for adaptation to new fonts. The ResNet50 model is utilized to accurately recognize segmented characters and modifiers. We achieve a character segmentation error of 3.354% and an overall segmentation error of 2.332%. The ResNet50 recognition model achieves an accuracy of 98.345%.
Keywords
masked word prediction; optical character recognition; pattern recognition; spell checker; textual image segmentation;
DOI:
http://doi.org/10.12928/telkomnika.v24i3.26961
Refbacks
There are currently no refbacks.
This work is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License .
TELKOMNIKA Telecommunication, Computing, Electronics and Control ISSN: 1693-6930 , e-ISSN: 2302-9293 Universitas Ahmad Dahlan , 4th Campus Jl. Ringroad Selatan, Kragilan, Tamanan, Banguntapan, Bantul, Yogyakarta, Indonesia 55191 Phone: +62 (274) 563515, 511830, 379418, 371120 Fax: +62 274 564604
<div class="statcounter"><a title="Web Analytics" href="http://statcounter.com/" target="_blank"><img class="statcounter" src="//c.statcounter.com/10241713/0/0b6069be/0/" alt="Web Analytics"></a></div> View TELKOMNIKA Stats