Digitization of Text documents Using PDF/A

被引：3

作者：

Han, Yan ^{[1
]}

Wan, Xueheng ^{[2
]}

机构：

[1] Univ Arizona Lib, Tucson, AZ 85721 USA

[2] Univ Arizona, Dept Comp Sci, Tucson, AZ 85721 USA

来源：

INFORMATION TECHNOLOGY AND LIBRARIES | 2018年 / 37卷 / 01期

关键词：

D O I：

10.6017/ITAL.V37I1.9878

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The purpose of this article is to demonstrate a practical use case of PDF/A for digitization of text documents following FADGI's recommendation of using PDF/A as a preferred digitization file format. The authors demonstrate how to convert and combine TIFFs with associated metadata into a single PDF/A-2b file for a document. Using real-life examples and open source software, the authors show readers how to convert TIFF images, extract associated metadata and International Color Consortium (ICC) profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and ICC profiles. Providing theoretical analysis and empirical examples, the authors show that PDF/A has many advantages over the traditionally preferred file format, TIFF/JPEG2000, for digitization of text documents.

引用

页码：52 / 64

页数：13

共 50 条

[31] Text Document Summarization Using POS tagging for Kannada Text Documents
Jayashree, R.
Anami, Basavaraj S.
Poornima, B. K.
2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 423 - 426
[32] Blind digital watermarking in PDF documents using Spread Transform Dither Modulation
Bitar, Ahmad W.
Darazi, Rony
Couchot, Jean-Francois
Couturier, Raphael
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (01) : 143 - 161
[33] DETECTING MALICIOUS PDF DOCUMENTS USING SEMI-SUPERVISED MACHINE LEARNING
Jiang, Jianguo
Song, Nan
Yu, Min
Chow, Kam-Pui
Li, Gang
Liu, Chao
Huang, Weiqing
ADVANCES IN DIGITAL FORENSICS XVII, 2021, 612 : 135 - 155
[34] PDF Accessibility Checker (PAC 2): The First Tool to Test PDF Documents for PDF/UA Compliance
Uebelbacher, Andreas
Bianchetti, Roberto
Riesch, Markus
COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2014, PT I, 2014, 8547 : 197 - 201
[35] Blind digital watermarking in PDF documents using Spread Transform Dither Modulation
Ahmad W. Bitar
Rony Darazi
Jean-François Couchot
Raphaël Couturier
Multimedia Tools and Applications, 2017, 76 : 143 - 161
[36] Automatic Paragraph Detection for Accessible PDF Documents
Darvishy, Alireza
Nevill, Mark
Hutter, Hans-Peter
COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2016, PT I, 2016, 9758 : 367 - 372
[37] A practical approach on clustering malicious PDF documents
Cristina Vatamanu
Dragoş Gavriluţ
Răzvan Benchea
Journal in Computer Virology, 2012, 8 (4): : 151 - 163
[38] Information Steganography algorithm based on PDF documents
Zhong, Shangping
Chen, Tierui
Jisuanji Gongcheng/Computer Engineering, 2006, 32 (03): : 161 - 163
[39] Investigating Accessibility Problems of Arabic PDF Documents
AlMasoud, Ameera M.
Al-Khalifa, Hend S.
2013 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2013,
[40] Shape from contour for the digitization of curved documents
Courteille, Frederic
Durou, Jean-Denis
Gurdjos, Pierre
COMPUTER VISION - ACCV 2007, PT II, PROCEEDINGS, 2007, 4844 : 196 - 205

← 1 2 3 4 5 →