Optimizing Document Classification: Unleashing the Power of Genetic Algorithms

被引:7
|
作者
Mustafa, Ghulam [1 ]
Rauf, Abid [2 ]
Al-Shamayleh, Ahmad Sami [3 ]
Sulaiman, Muhammad [4 ]
Alrawagfeh, Wagdi [5 ]
Afzal, Muhammad Tanvir [1 ]
Akhunzada, Adnan
机构
[1] Shifa Tameer Emillat Univ, Dept Comp, Islamabad 44000, Pakistan
[2] Univ Engn & Technol, Dept Comp Sci, Taxila 47080, Pakistan
[3] Ah Ahliyya Amman Univ, Fac Informat Technol, Dept Network & Cybersecur, Amman 19328, Jordan
[4] Univ Stavenger, Dept Comp Sci, N-9990 Stavanger, Norway
[5] Univ Doha Sci & Technol, Coll Comp & IT, Doha, Qatar
关键词
Document classification (DC); Word2Vector (W2V); bag of word (BOW); term frequency (TF); association for computing machinery (ACM); machine learning (ML); TEXT;
D O I
10.1109/ACCESS.2023.3292248
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features such as, title, keyword, abstract, and general terms from the SANTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.
引用
收藏
页码:83136 / 83149
页数:14
相关论文
共 50 条
  • [31] Decision support system based on Genetic Algorithms for optimizing the Operation Planning of Hydrothermal Power Systems
    Federal University of ABC, FABC, Santo-Andre-Sao-Paulo, Brazil
    IYCE - Proc.: Int. Youth Conf. Energy,
  • [32] Machine Learning Algorithms for Document Classification: Comparative Analysis
    Rashid, Faizur
    Gargaare, Suleiman M. A.
    Aden, Abdulkadir H.
    Abdi, Afendi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 260 - 265
  • [33] Bayesian Web document classification through optimizing association word
    Ko, SJ
    Choi, JH
    Lee, JH
    DEVELOPMENTS IN APPLIED ARTIFICIAL INTELLIGENCE, 2003, 2718 : 565 - 574
  • [34] Optimizing Quantum Classification Algorithms on Classical Benchmark Datasets
    John, Manuel
    Schuhmacher, Julian
    Barkoutsos, Panagiotis
    Tavernelli, Ivano
    Tacchino, Francesco
    ENTROPY, 2023, 25 (06)
  • [35] DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding
    Feng, Hao
    Liu, Qi
    Liu, Hao
    Tang, Jingqun
    Zhou, Wengang
    Li, Houqiang
    Huang, Can
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (12)
  • [36] DocPedia: unleashing the power of large multimodal model in the frequency domain for versatile document understanding
    Hao FENG
    Qi LIU
    Hao LIU
    Jingqun TANG
    Wengang ZHOU
    Houqiang LI
    Can HUANG
    Science China(Information Sciences), 2024, 67 (12) : 65 - 78
  • [37] PATTERN-CLASSIFICATION WITH GENETIC ALGORITHMS
    BANDYOPADHYAY, S
    MURTHY, CA
    PAL, SK
    PATTERN RECOGNITION LETTERS, 1995, 16 (08) : 801 - 808
  • [38] On Stability and Classification Tools for Genetic Algorithms
    Kotowski, Stefan
    Kosinski, Witold
    Michalewicz, Zbigniew
    Synak, Piotr
    Brocki, Lukasz
    FUNDAMENTA INFORMATICAE, 2009, 96 (04) : 477 - 491
  • [39] Use of Genetic Algorithms for Classification of Datasets
    Shanabog, Nandish C. S.
    Ashwinkumar, U. M.
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2017, : 2016 - 2020
  • [40] Painter Classification Using Genetic Algorithms
    Levy, Erez
    David, Omid
    Netanyahu, Nathan S.
    2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 3027 - 3034