Fringe Map Based Text Line Segmentation of Printed Telugu Document Images

被引:5
|
作者
Koppula, Vijaya Kumar [1 ]
Negi, Atul [2 ]
机构
[1] CMR Coll Engn & Technol, Dept CSE, Hyderabad 501401, Andhra Pradesh, India
[2] Univ Hyderabad, Dept CIS, Hyderabad 500046, Andhra Pradesh, India
关键词
Text line segmentation; Indic scripts; Telugu OCR; Fringe Maps;
D O I
10.1109/ICDAR.2011.260
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text line segmentation is a crucial and important step which can greatly influence the accuracy of an OCR system. One of the major obstacles to building high-accuracy OCR systems for Indic scripts has been the text line segmentation problem. In particular for Telugu script this problem is still to be adequately addressed by research. The common methods of Roman script are not applicable due to the inherent script complexity of Telugu. Previous approaches to Telugu OCR in the literature take a simplified view of the problem, leading to errors in line segmentation. The problem is compounded in old documents that are typeset manually and have non-uniform print quality. In this work we propose a new method using the fringe map concept. In a fringe map each pixel of the binary image is associated with a fringe number that denotes the distance to the nearest black pixel. We use fringe value information to segment text lines. First we locate peak fringe numbers (PFNs). PFNs that are not between lines are filtered out. PFNs between adjacent lines are used to construct a region. The segmenting path between the adjacent lines is found by joining the filtered PFNs of a region.
引用
收藏
页码:1294 / 1298
页数:5
相关论文
共 50 条
  • [31] A two-step framework for text line segmentation in historical Arabic and Latin document images
    Mechi, Olfa
    Mehri, Maroua
    Ingold, Rolf
    Essoukri Ben Amara, Najoua
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (03) : 197 - 218
  • [32] HPSegNet: A Method for Handwritten and Printed Text Separation in Document Images
    Chao, Yu
    Liu, Changsong
    Peng, Liangrui
    Wang, Yanwei
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024 WORKSHOPS, PT II, 2024, 14936 : 184 - 198
  • [33] Machine printed text and handwriting identification in noisy document images
    Zheng, YF
    Li, HP
    Doermann, D
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (03) : 337 - 353
  • [34] Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images
    Arpita Dutta
    Arpan Garai
    Samit Biswas
    Amit Kumar Das
    International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 299 - 313
  • [35] Segmentation of text lines using multi-scale CNN from warped printed and handwritten document images
    Dutta, Arpita
    Garai, Arpan
    Biswas, Samit
    Das, Amit Kumar
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (04) : 299 - 313
  • [36] Restoration of arbitrarily warped document images based on text line and word detection
    Gatos, B.
    Ntirogiannis, K.
    PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PATTERN RECOGNITION, AND APPLICATIONS, 2007, : 203 - +
  • [37] Script-Independent Text Segmentation from Document Images
    Sahare P.
    Tembhurne J.V.
    Parate M.R.
    Diwan T.
    Dhok S.B.
    International Journal of Ambient Computing and Intelligence, 2022, 13 (01)
  • [38] An Implicit Segmentation Approach for Telugu Text Recognition Based on Hidden Markov Models
    Rao, D. Koteswara
    Negi, Atul
    ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS (SIRS-2015), 2016, 425 : 633 - 644
  • [39] Text line segmentation in handwritten document using a production system
    Nicolas, S
    Paquet, T
    Heutte, L
    NINTH INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION, PROCEEDINGS, 2004, : 245 - 250
  • [40] Discrimination of Handwritten and Machine Printed Text In Scanned Document Images based on Rough Set Theory
    Narayan, Surabhi
    Gowda, Sahana D.
    PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 590 - 594