A Highly Accurate PDF-To-Text Conversion System for Academic Papers Using Natural Language Processing Approach

被引:0
|
作者
Yong, Tien Fui [1 ]
Azad, Saiful [2 ,3 ]
Rahman, Mohammed Mostafizur [4 ]
Zamli, Kamal Z. [2 ,3 ]
Rabby, Gollam [2 ]
机构
[1] Univ Tunku Abdul Rahman, Fac Informat & Commun Technol, Kampar 31900, Perak, Malaysia
[2] Univ Malaysia Pahang, Fac Comp Syst & Software Engn, Gambang 26300, Pahang, Malaysia
[3] UMP, IBM Ctr Excellence, Gambang, Malaysia
[4] Amer Int Univ Bangladesh, Dhaka, Bangladesh
关键词
PDF-To-Text Conversion; Natural Language Processing; Edit Distance;
D O I
10.1166/asl.2018.13029
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Extracting text out of PDF documents is never an easy task when a higher degree of accuracy and consistency are the two main criteria to be attained. Although, there exist a considerable number of such systems; however, most of them are falling short of offering desirable performance especially when academic literature is the concern. Researches, those involved heavily in text mining and project analyzing, need an accurate and consistent supporting tool for PDF-To-Text (PTT) conversion. Therefore, in this paper, we propose a Natural Language Processing based PDF-to-text (NLPDF) conversion system, which comprises of two major steps, namely (i) reads contents from the PDF and (ii) reconstruct the text. The performance of the proposed system is evaluated via four metrics, namely Precision, Recall, F-Measure (AF), and standard deviation, and compared with eight other similar benchmarked systems available in the market. The experimental results evidently demonstrate the effectiveness of the proposed system.
引用
收藏
页码:7844 / 7849
页数:6
相关论文
共 50 条
  • [31] Examination System Automation Using Natural Language Processing
    Kumar, R. Praveen
    Muthukumaran, N.
    Hanisha, Cheruku
    Rithvik, Pohar
    Goud, M.Saipavan
    International Conference on Self Sustainable Artificial Intelligence Systems, ICSSAS 2023 - Proceedings, 2023, : 1002 - 1008
  • [32] Automated Grading System using Natural Language Processing
    Rokade, Amit
    Patil, Bhushan
    Rajani, Sana
    Revandkar, Surabhi
    Shedge, Rajashree
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1123 - 1127
  • [33] Second language learning system on the WWW using natural language processing
    Dansuwan, S
    Nishina, K
    Akahori, K
    PROCEEDINGS OF ICCE'98, VOL 1 - GLOBAL EDUCATION ON THE NET, 1998, : 599 - 605
  • [34] Accurate Identification of Colonoscopy Quality and Polyp Findings Using Natural Language Processing
    Lee, Jeffrey K.
    Jensen, Christopher D.
    Levin, Theodore R.
    Zauber, Ann G.
    Doubeni, Chyke A.
    Zhao, Wei K.
    Corley, Douglas A.
    JOURNAL OF CLINICAL GASTROENTEROLOGY, 2019, 53 (01) : E25 - E30
  • [35] Automatic Video summarization with Timestamps using natural language processing text fusion
    Emad, Ahmed
    Bassel, Fady
    Refaat, Mark
    Abdelhamed, Mohamed
    Shorim, Nada
    AbdelRaouf, Ashraf
    2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 60 - 66
  • [36] Towards improved collaborative text editing CRDTs by using Natural Language Processing
    Bauwens, Jim
    De Porre, Kevin
    Boix, Elisa Gonzalez
    PROCEEDINGS OF THE 10TH WORKSHOP ON PRINCIPLES AND PRACTICE OF CONSISTENCY FOR DISTRIBUTED DATA, PAPOC 2023, 2023, : 51 - 55
  • [37] Automated Essay Scoring Using Natural Language Processing And Text Mining Method
    Gunawansyah
    Rahayu, Riska
    Nurwathi
    Sugiarto, Bambang
    Gunawan
    PROCEEDING OF 14TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATION SYSTEMS, SERVICES, AND APPLICATIONS (TSSA), 2020,
  • [38] Development of GUI for Text-to-Speech Recognition using Natural Language Processing
    Mukherjee, Partha
    Santra, Soumen
    Bhowmick, Subhajit
    Paul, Ananya
    Chatterjee, Pubali
    Deyasi, Arpan
    2018 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS, MATERIALS ENGINEERING & NANO-TECHNOLOGY (IEMENTECH), 2018, : 195 - 198
  • [39] Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach
    Chaturvedi, Jaya
    Velupillai, Sumithra
    Stewart, Robert
    Roberts, Angus
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 695 - 699
  • [40] Automatic Extraction of Engineering Rules From Unstructured Text: A Natural Language Processing Approach
    Ye, Xinfeng
    Lu, Yuqian
    JOURNAL OF COMPUTING AND INFORMATION SCIENCE IN ENGINEERING, 2020, 20 (03)