Script-Independent Text Segmentation from Document Images

被引:0
|
作者
Sahare P. [1 ]
Tembhurne J.V. [1 ]
Parate M.R. [1 ]
Diwan T. [1 ]
Dhok S.B. [2 ]
机构
[1] Indian Institute of Information Technology, Nagpur
[2] Visvesvaraya National Institute of Technology, Nagpur
关键词
Document Handling; Fast Marching Method; Image Texture Analysis; Text Processing; Text-Line Segmentation; Word Segmentation;
D O I
10.4018/IJACI.313967
中图分类号
学科分类号
摘要
Document image analysis finds broad application in the digital world for the purpose of information retrieval. This includes optical character recognition (OCR), indexing of digital libraries, web image processing, etc. One of the important steps in this field is text segmentation. This segmentation becomes complicated for the documents containing text of uneven spacing and characters of varying font sizes. In this paper, script-independent text-line segmentation and word segmentation algorithms are presented. Fast marching method is used for text-line segmentation, whereas wavelet transform with connected components (CCs) labeling is used for word segmentation. Fast marching method is used as a region growing process that detects potential text-lines. For word segmentation, energy map is calculated using wavelet transform to create text-blocks. Both the proposed algorithms are evaluated on different databases containing documents of different scripts, where highest text-line and word segmentation accuracies of 98.9% and 99.1%, respectively, are obtained. Copyright © 2022, IGI Global.
引用
收藏
相关论文
共 50 条
  • [41] An Approach for Automatic Indic Script Identification from Handwritten Document Images
    Obaidullah, Sk. Md.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    ADVANCED COMPUTING AND SYSTEMS FOR SECURITY, VOL 2, 2016, 396 : 37 - 51
  • [42] Fringe map based text line segmentation of printed Telugu document images
    Department of CSE, CMR College of Engineering and Technology, Hyderabad 501401, India
    不详
    Proc. Int. Conf. Doc. Anal. Recognit., (1294-1298):
  • [43] Fringe Map Based Text Line Segmentation of Printed Telugu Document Images
    Koppula, Vijaya Kumar
    Negi, Atul
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1294 - 1298
  • [44] Categorizing document images into script and language classes
    Suen, CY
    Bergler, S
    Nobile, N
    Waked, B
    Nadal, CP
    Bloch, A
    INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, 1999, : 297 - 306
  • [45] Text line segmentation using a fully convolutional network in handwritten document images
    Quang Nhat Vo
    Kim, Soo Hyung
    Yang, Hyung Jeong
    Lee, Guee Sang
    IET IMAGE PROCESSING, 2018, 12 (03) : 438 - 446
  • [46] Segmentation of text from color map images
    Tofani, P
    Kasturi, R
    FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 945 - 947
  • [47] Foreground text segmentation in complex color document images using Gabor filters
    Nirmala, S.
    Nagabhushan, P.
    SIGNAL IMAGE AND VIDEO PROCESSING, 2012, 6 (04) : 669 - 678
  • [48] Comparison of some thresholding algorithms for text/background segmentation in difficult document images
    Leedham, G
    Yan, C
    Takru, K
    Tan, JHN
    Mian, L
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 859 - 863
  • [49] Connected Operators for Non-text Object Segmentation in Grayscale Document Images
    Mysore, Sheshera
    Gupta, Manish Kumar
    Belhe, Swapnil
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2016, VOL 1, 2017, 459 : 399 - 407
  • [50] Text Region Segmentation From Heterogeneous Images
    Gopalan, Chitrakala
    Manjula
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (10): : 108 - 113