Script-Independent Text Segmentation from Document Images

被引:0
|
作者
Sahare P. [1 ]
Tembhurne J.V. [1 ]
Parate M.R. [1 ]
Diwan T. [1 ]
Dhok S.B. [2 ]
机构
[1] Indian Institute of Information Technology, Nagpur
[2] Visvesvaraya National Institute of Technology, Nagpur
关键词
Document Handling; Fast Marching Method; Image Texture Analysis; Text Processing; Text-Line Segmentation; Word Segmentation;
D O I
10.4018/IJACI.313967
中图分类号
学科分类号
摘要
Document image analysis finds broad application in the digital world for the purpose of information retrieval. This includes optical character recognition (OCR), indexing of digital libraries, web image processing, etc. One of the important steps in this field is text segmentation. This segmentation becomes complicated for the documents containing text of uneven spacing and characters of varying font sizes. In this paper, script-independent text-line segmentation and word segmentation algorithms are presented. Fast marching method is used for text-line segmentation, whereas wavelet transform with connected components (CCs) labeling is used for word segmentation. Fast marching method is used as a region growing process that detects potential text-lines. For word segmentation, energy map is calculated using wavelet transform to create text-blocks. Both the proposed algorithms are evaluated on different databases containing documents of different scripts, where highest text-line and word segmentation accuracies of 98.9% and 99.1%, respectively, are obtained. Copyright © 2022, IGI Global.
引用
收藏
相关论文
共 50 条
  • [1] Script-independent text line segmentation in freestyle handwritten documents
    Li, Yi
    Zheng, Yefeng
    Doermann, David
    Jaeger, Stefan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2008, 30 (08) : 1313 - 1329
  • [2] Adaptive Script-Independent Text Line Extraction
    Ziaratban, Majid
    Faez, Karim
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (04): : 866 - 877
  • [3] Script independent text segmentation of document images using graph network based shortest path scheme
    Sahare P.
    Tembhurne J.V.
    Parate M.R.
    Diwan T.
    Dhok S.B.
    International Journal of Information Technology, 2023, 15 (4) : 2247 - 2261
  • [4] Script-independent, HMM-based text line finding for OCR
    Lu, ZD
    Schwartz, R
    Raphael, C
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 551 - 554
  • [5] Segmentation of text and graphics from document images
    Chowdhury, S. P.
    Mandal, S.
    Das, A. K.
    Chanda, Bhabatosh
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 619 - +
  • [6] A script-independent methodology for optical character recognition
    Makhoul, J
    Schwartz, R
    Lapre, C
    Bazzi, I
    PATTERN RECOGNITION, 1998, 31 (09) : 1285 - 1294
  • [7] Segmentation and Text extraction from Document Images: Survey
    Mukarambi, Gururaj
    Gaikwad, Hema
    Dhandra, B., V
    2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,
  • [8] PaperDiff: A Script Independent Automatic Method for Finding The Text Differences Between Two Document Images
    Ramachandrula, Sitaram
    Joshi, Gopal Datt
    Noushath, S.
    Parikh, Pulkit
    Guptat, Vishal
    PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, 2008, : 585 - 590
  • [9] Script-free text line segmentation using interline space model for printed document images
    Kim, Minwoo
    Oh, Il-Seok
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1354 - 1358
  • [10] Text segmentation in degraded historical document images
    Kavitha, A. S.
    Shivakumara, P.
    Kumar, G. H.
    Lu, Tong
    EGYPTIAN INFORMATICS JOURNAL, 2016, 17 (02) : 189 - 197