PaperDiff: A Script Independent Automatic Method for Finding The Text Differences Between Two Document Images

被引:2
|
作者
Ramachandrula, Sitaram [1 ]
Joshi, Gopal Datt [1 ]
Noushath, S. [1 ]
Parikh, Pulkit [1 ]
Guptat, Vishal [1 ]
机构
[1] Hewlett Packard Labs India, Bangalore, Karnataka, India
关键词
D O I
10.1109/DAS.2008.69
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we introduce a novel concept called PaperDiff and propose an algorithm to implement it. The aim of PaperDiff is to compare two printed (paper) documents using their images and determine the differences in terms of text inserted, deleted and substituted between them. This lets an end-user compare two documents which are already printed or even if one of which is printed (the other could be in electronic form such as MS-word *.doc file). The algorithm we have proposed for realizing PaperDiff is based on word image comparison and is even suitable for symbol strings and for any script/language (including multiple scripts) in the documents, where even mature optical character recognition (OCR) technology has had very little success. PaperDiff enables end-users like lawyers, novelists, etc, in comparing new document versions with older versions of them. Our proposed method is suitable even when the formatting of content is different between the two input documents, where the structures of the document images are different (for e.g., differing page widths, page structure etc). An experiment of PaperDiff on single column text documents yielded 99.2% accuracy while detecting 135 induced differences in 10 pairs of documents.
引用
收藏
页码:585 / 590
页数:6
相关论文
共 50 条
  • [21] Script-free text line segmentation using interline space model for printed document images
    Kim, Minwoo
    Oh, Il-Seok
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 1354 - 1358
  • [22] Correcting bound document images based on automatic and robust curved text lines estimation
    Ma, Yichao
    Wang, Chunheng
    Dai, Ruwei
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 197 - +
  • [23] Entropy quantifiers useful for establishing equivalence between text document images
    Gowda, Sahana D.
    Nagabhushan, P.
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL III, PROCEEDINGS, 2007, : 420 - 425
  • [24] Establishment of Equivalence Between Two Degraded Document Images
    Mallur, Muralidhara
    Nagabhushan, P.
    Gowda, Sahana D.
    2015 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2015, : 849 - 854
  • [25] A method for automatic classification of gender based on text- independent handwriting
    Payal Maken
    Abhishek Gupta
    Multimedia Tools and Applications, 2021, 80 : 24573 - 24602
  • [26] A Text-Independent Forced Alignment Method for Automatic Phoneme Segmentation
    Wohlan, Bryce
    Pham, Duc-Son
    Chan, Kit Yan
    Ward, Roslyn
    AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 585 - 598
  • [27] A method for automatic classification of gender based on text- independent handwriting
    Maken, Payal
    Gupta, Abhishek
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 24573 - 24602
  • [28] Text extraction method for historical Tibetan document images based on block projections
    Duan L.-J.
    Zhang X.-Q.
    Ma L.-L.
    Wu J.
    Optoelectronics Letters, 2017, 13 (6) : 457 - 461
  • [29] Text extraction method for historical Tibetan document images based on block projections
    段立娟
    张西群
    马龙龙
    吴健
    OptoelectronicsLetters, 2017, 13 (06) : 457 - 461
  • [30] A Hybrid Adaptive Thresholding Method for Text with Halftone Pattern in Scanned Document Images
    Yu, Songyang
    Ming, Wei
    COLOR IMAGING XVI: DISPLAYING, PROCESSING, HARDCOPY, AND APPLICATIONS, 2011, 7866