Script-Independent Text Segmentation from Document Images

被引:0
|
作者
Sahare P. [1 ]
Tembhurne J.V. [1 ]
Parate M.R. [1 ]
Diwan T. [1 ]
Dhok S.B. [2 ]
机构
[1] Indian Institute of Information Technology, Nagpur
[2] Visvesvaraya National Institute of Technology, Nagpur
关键词
Document Handling; Fast Marching Method; Image Texture Analysis; Text Processing; Text-Line Segmentation; Word Segmentation;
D O I
10.4018/IJACI.313967
中图分类号
学科分类号
摘要
Document image analysis finds broad application in the digital world for the purpose of information retrieval. This includes optical character recognition (OCR), indexing of digital libraries, web image processing, etc. One of the important steps in this field is text segmentation. This segmentation becomes complicated for the documents containing text of uneven spacing and characters of varying font sizes. In this paper, script-independent text-line segmentation and word segmentation algorithms are presented. Fast marching method is used for text-line segmentation, whereas wavelet transform with connected components (CCs) labeling is used for word segmentation. Fast marching method is used as a region growing process that detects potential text-lines. For word segmentation, energy map is calculated using wavelet transform to create text-blocks. Both the proposed algorithms are evaluated on different databases containing documents of different scripts, where highest text-line and word segmentation accuracies of 98.9% and 99.1%, respectively, are obtained. Copyright © 2022, IGI Global.
引用
收藏
相关论文
共 50 条
  • [21] Coupled snakelets for curled text-line segmentation from warped document images
    Bukhari, Syed Saqib
    Shafait, Faisal
    Breuel, Thomas M.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2013, 16 (01) : 33 - 53
  • [22] Coupled snakelets for curled text-line segmentation from warped document images
    Syed Saqib Bukhari
    Faisal Shafait
    Thomas M. Breuel
    International Journal on Document Analysis and Recognition (IJDAR), 2013, 16 : 33 - 53
  • [23] Signature Segmentation from Document Images
    Ahmed, Sheraz
    Malik, Muhammad Imran
    Liwicki, Marcus
    Dengel, Andreas
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 425 - 429
  • [24] Text region extraction and text segmentation on camera-captured document style images
    Song, YJ
    Kim, KC
    Choi, YW
    Byun, HR
    Kim, SH
    Chi, SY
    Jang, DK
    Chung, YK
    EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 172 - 176
  • [25] Segmentation of text from compound images
    Krishnan, N.
    Babu, C. Nelson Kennedy
    Ravi, S.
    Thavamani, Josphine
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL III, PROCEEDINGS, 2007, : 526 - +
  • [26] Script Independent Scene Text Segmentation using Fast Stroke Width Transform and GrabCut
    Bosamiya, Jay H.
    Agrawal, Palash
    Roy, Partha Pratim
    Balasubramanian, R.
    PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 151 - 155
  • [27] Text and Script Independent Writer Identification
    Dhandra, B. V.
    Vijayalaxmi, M. B.
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 586 - 590
  • [28] Text Line Segmentation in Handwritten Document Images Using Tensor Voting
    Toan Dinh Nguyen
    Gueesang Lee
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2011, E94A (11) : 2434 - 2441
  • [29] A multi-plane approach for text segmentation of complex document images
    Chen, Yen-Lin
    Wu, Bing-Fei
    PATTERN RECOGNITION, 2009, 42 (07) : 1419 - 1444
  • [30] Semantic Segmentation of Printed Text from Marathi Document Images using Deep Learning Methods
    Akhter, Shaheera Saba Mohd Naseem
    Rege, Priti P.
    2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,