PaperDiff: A Script Independent Automatic Method for Finding The Text Differences Between Two Document Images

被引：2

作者：

Ramachandrula, Sitaram ^{[1
]}

Joshi, Gopal Datt ^{[1
]}

Noushath, S. ^{[1
]}

Parikh, Pulkit ^{[1
]}

Guptat, Vishal ^{[1
]}

机构：

[1] Hewlett Packard Labs India, Bangalore, Karnataka, India

来源：

PROCEEDINGS OF THE 8TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS | 2008年

关键词：

D O I：

10.1109/DAS.2008.69

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we introduce a novel concept called PaperDiff and propose an algorithm to implement it. The aim of PaperDiff is to compare two printed (paper) documents using their images and determine the differences in terms of text inserted, deleted and substituted between them. This lets an end-user compare two documents which are already printed or even if one of which is printed (the other could be in electronic form such as MS-word *.doc file). The algorithm we have proposed for realizing PaperDiff is based on word image comparison and is even suitable for symbol strings and for any script/language (including multiple scripts) in the documents, where even mature optical character recognition (OCR) technology has had very little success. PaperDiff enables end-users like lawyers, novelists, etc, in comparing new document versions with older versions of them. Our proposed method is suitable even when the formatting of content is different between the two input documents, where the structures of the document images are different (for e.g., differing page widths, page structure etc). An experiment of PaperDiff on single column text documents yielded 99.2% accuracy while detecting 135 induced differences in 10 pairs of documents.

引用

页码：585 / 590

页数：6

共 50 条

[41] A two-step framework for text line segmentation in historical Arabic and Latin document images
Olfa Mechi
Maroua Mehri
Rolf Ingold
Najoua Essoukri Ben Amara
International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 197 - 218
[42] Comparison of Document Index Graph Using TextRank and HITS Weighting Method in Automatic Text Summarization
Hadyan, Fadhlil
Shaufiah
Bijaksana, Moch. Arif
1ST INTERNATIONAL CONFERENCE ON COMPUTING AND APPLIED INFORMATICS 2016 : APPLIED INFORMATICS TOWARD SMART ENVIRONMENT, PEOPLE, AND SOCIETY, 2017, 801
[43] A two-step framework for text line segmentation in historical Arabic and Latin document images
Mechi, Olfa
Mehri, Maroua
Ingold, Rolf
Essoukri Ben Amara, Najoua
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (03) : 197 - 218
[44] The method for finding the correspondence between scores in two tests
Otis, AS
JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1922, 13 : 529 - 545
[45] AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT
Obaidullah, Sk Md
Halder, Chayan
Santosh, K. C.
Das, Nibaran
Roy, Kaushik
MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2018, 31 (01) : 63 - 84
[46] Finding Similarities in Differences Between Autistic Adults: Two Replicated Subgroups
Radhoe, Tulsi A. A.
Agelink van Rentergem, Joost A. A.
Torenvliet, Carolien
Groenman, Annabeth P. P.
van der Putten, Wikke J. J.
Geurts, Hilde M. M.
JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS, 2024, 54 (09) : 3449 - 3466
[47] SEGMENTATION METHOD OF DOCUMENT IMAGES BY TWO-DIMENSIONAL FOURIER TRANSFORMATION.
Hase, Masahiko
Hoshino, Yasushi
1985, (16)
[48] An Automatic Text Classification Method Based on Hierarchical Taxonomies, Neural Networks and Document Embedding: The NETHIC Tool
Lomasto, Luigi
Di Florio, Rosario
Ciapetti, Andrea
Miscione, Giuseppe
Ruggiero, Giulia
Toti, Daniele
ENTERPRISE INFORMATION SYSTEMS (ICEIS 2019), 2020, 378 : 57 - 77
[49] METHOD OF AUTOMATIC DOCUMENT INDEXING, INTRODUCING LINKS BETWEEN ELEMENTS OF SEARCH PATTERN
RUBLEV, YV
TUZ, VT
VOSTROV, GN
NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1973, (10): : 30 - 33
[50] A localization/verification scheme for finding text in images and video frames based on contrast independent features and machine learning methods
Chen, DT
Odobez, JM
Thiran, JP
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2004, 19 (03) : 205 - 217

← 1 2 3 4 5 →