Digitization of Text documents Using PDF/A

被引:3
|
作者
Han, Yan [1 ]
Wan, Xueheng [2 ]
机构
[1] Univ Arizona Lib, Tucson, AZ 85721 USA
[2] Univ Arizona, Dept Comp Sci, Tucson, AZ 85721 USA
关键词
D O I
10.6017/ITAL.V37I1.9878
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The purpose of this article is to demonstrate a practical use case of PDF/A for digitization of text documents following FADGI's recommendation of using PDF/A as a preferred digitization file format. The authors demonstrate how to convert and combine TIFFs with associated metadata into a single PDF/A-2b file for a document. Using real-life examples and open source software, the authors show readers how to convert TIFF images, extract associated metadata and International Color Consortium (ICC) profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container that accommodates all the data from digitization of textual materials, including page-level metadata and ICC profiles. Providing theoretical analysis and empirical examples, the authors show that PDF/A has many advantages over the traditionally preferred file format, TIFF/JPEG2000, for digitization of text documents.
引用
收藏
页码:52 / 64
页数:13
相关论文
共 50 条
  • [31] Text Document Summarization Using POS tagging for Kannada Text Documents
    Jayashree, R.
    Anami, Basavaraj S.
    Poornima, B. K.
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 423 - 426
  • [32] Blind digital watermarking in PDF documents using Spread Transform Dither Modulation
    Bitar, Ahmad W.
    Darazi, Rony
    Couchot, Jean-Francois
    Couturier, Raphael
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (01) : 143 - 161
  • [33] DETECTING MALICIOUS PDF DOCUMENTS USING SEMI-SUPERVISED MACHINE LEARNING
    Jiang, Jianguo
    Song, Nan
    Yu, Min
    Chow, Kam-Pui
    Li, Gang
    Liu, Chao
    Huang, Weiqing
    ADVANCES IN DIGITAL FORENSICS XVII, 2021, 612 : 135 - 155
  • [34] PDF Accessibility Checker (PAC 2): The First Tool to Test PDF Documents for PDF/UA Compliance
    Uebelbacher, Andreas
    Bianchetti, Roberto
    Riesch, Markus
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2014, PT I, 2014, 8547 : 197 - 201
  • [35] Blind digital watermarking in PDF documents using Spread Transform Dither Modulation
    Ahmad W. Bitar
    Rony Darazi
    Jean-François Couchot
    Raphaël Couturier
    Multimedia Tools and Applications, 2017, 76 : 143 - 161
  • [36] Automatic Paragraph Detection for Accessible PDF Documents
    Darvishy, Alireza
    Nevill, Mark
    Hutter, Hans-Peter
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2016, PT I, 2016, 9758 : 367 - 372
  • [37] A practical approach on clustering malicious PDF documents
    Cristina Vatamanu
    Dragoş Gavriluţ
    Răzvan Benchea
    Journal in Computer Virology, 2012, 8 (4): : 151 - 163
  • [38] Information Steganography algorithm based on PDF documents
    Zhong, Shangping
    Chen, Tierui
    Jisuanji Gongcheng/Computer Engineering, 2006, 32 (03): : 161 - 163
  • [39] Investigating Accessibility Problems of Arabic PDF Documents
    AlMasoud, Ameera M.
    Al-Khalifa, Hend S.
    2013 FOURTH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2013,
  • [40] Shape from contour for the digitization of curved documents
    Courteille, Frederic
    Durou, Jean-Denis
    Gurdjos, Pierre
    COMPUTER VISION - ACCV 2007, PT II, PROCEEDINGS, 2007, 4844 : 196 - 205