A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach

被引:0
|
作者
Yong, Tien Fui [1 ]
Azad, Saiful [2 ,3 ]
Rahman, Mohammed Mostafizur [4 ]
Zamli, Kamal Z. [2 ,3 ]
Rabby, Gollam [2 ]
机构
[1] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Kampar 31900, Perak, Malaysia
[2] Univ Malaysia Pahang, Fac Comp Syst & Software Engn, Gambang 26300, Pahang, Malaysia
[3] UMP, IBM Ctr Excellence, Gambang, Malaysia
[4] Amer Int Univ Bangladesh, Dhaka, Bangladesh
关键词
PDF-To-Text Conversion; Natural Language Processing; Edit Distance;
D O I
10.1166/asl.2018.13029
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F-Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system.
引用
收藏
页码:7844 / 7849
页数:6
相关论文
共 50 条
  • [1] ACADEMIC TEXT CLUSTERING USING NATURAL LANGUAGE PROCESSING
    Taskiran, Salimkan Fatma
    Kaya, Ersin
    KONYA JOURNAL OF ENGINEERING SCIENCES, 2022, 10 : 41 - 51
  • [2] A NATURAL LANGUAGE PROGRAMMING SYSTEM FOR TEXT PROCESSING
    BARNETT, MP
    RUHSAM, WM
    IEEE TRANSACTIONS ON ENGINEERING WRITING AND SPEECH, 1968, EW11 (02): : 45 - &
  • [3] The Text Analysis and Processing of Thai Language Text to Speech Conversion System
    Lin, Xuee
    Yang, Jian
    Zhao, Juan
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 436 - 436
  • [4] Improved Text Extraction from PDF Documents for Large-Scale Natural Language Processing
    Tiedemann, Jorg
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PT I, 2014, 8403 : 102 - 112
  • [6] USING A TEXT-PROCESSING LANGUAGE FOR SERIAL RECORD CONVERSION
    LOWREY, JR
    HARDIMAN, PV
    INFORMATION TECHNOLOGY AND LIBRARIES, 1985, 4 (04) : 356 - 358
  • [7] PROCESSING NATURAL-LANGUAGE FOR AN EXPERT SYSTEM USING A SUBLANGUAGE APPROACH
    LIDDY, E
    JORGENSON, CL
    SIBERT, E
    YU, ES
    PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1989, 26 : 136 - 141
  • [8] A NATURAL-LANGUAGE TEXT-PROCESSING SYSTEM IN BIOLOGY
    VLEDUTSSTOKOLOV, N
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1982, 184 (SEP): : 7 - CINF
  • [9] An Approach for generating best possible questions from the given text using Natural Language Processing
    Vaidya, Kimaya
    Bhagwatkar, Neha
    Singh, Aditi
    Borikar, Sneha
    Padwad, Hirkani
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 271 - 277
  • [10] Neurolinguistic approach to natural language processing with applications to medical text analysis
    Duch, Wlodzisfaw
    Matykiewicz, Pawel
    Pestian, John
    NEURAL NETWORKS, 2008, 21 (10) : 1500 - 1510