Optimizing Document Classification: Unleashing the Power of Genetic Algorithms

被引:7
|
作者
Mustafa, Ghulam [1 ]
Rauf, Abid [2 ]
Al-Shamayleh, Ahmad Sami [3 ]
Sulaiman, Muhammad [4 ]
Alrawagfeh, Wagdi [5 ]
Afzal, Muhammad Tanvir [1 ]
Akhunzada, Adnan
机构
[1] Shifa Tameer Emillat Univ, Dept Comp, Islamabad 44000, Pakistan
[2] Univ Engn & Technol, Dept Comp Sci, Taxila 47080, Pakistan
[3] Ah Ahliyya Amman Univ, Fac Informat Technol, Dept Network & Cybersecur, Amman 19328, Jordan
[4] Univ Stavenger, Dept Comp Sci, N-9990 Stavanger, Norway
[5] Univ Doha Sci & Technol, Coll Comp & IT, Doha, Qatar
关键词
Document classification (DC); Word2Vector (W2V); bag of word (BOW); term frequency (TF); association for computing machinery (ACM); machine learning (ML); TEXT;
D O I
10.1109/ACCESS.2023.3292248
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features such as, title, keyword, abstract, and general terms from the SANTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.
引用
收藏
页码:83136 / 83149
页数:14
相关论文
共 50 条
  • [41] Genetic algorithms for automated texture classification
    Ashlock, D
    Davidson, J
    STATISTICAL AND STOCHASTIC METHODS IN IMAGE PROCESSING II, 1997, 3167 : 140 - 151
  • [42] CLASSIFICATION OF PLASMA SIGNALS BY GENETIC ALGORITHMS
    Santos, M.
    Cantos, A. J.
    FUSION SCIENCE AND TECHNOLOGY, 2010, 58 (02) : 706 - 713
  • [43] Statistical and genetic algorithms classification of highways
    Lingras, P
    JOURNAL OF TRANSPORTATION ENGINEERING-ASCE, 2001, 127 (03): : 237 - 243
  • [44] Applying genetic algorithms to query optimization in document retrieval
    Horng, JT
    Yeh, CC
    INFORMATION PROCESSING & MANAGEMENT, 2000, 36 (05) : 737 - 759
  • [45] Optimizing Parameters of an Optical Link by Using Genetic Algorithms
    Hakim A.
    Smail B.
    Hakim, Aoudia (hakim.aoudia@univ-bejaia.dz), 1600, Walter de Gruyter GmbH (39): : 101 - 107
  • [46] Optimizing peer selection in BitTorrent networks with genetic algorithms
    Wu, Tiejun
    Li, Maozhen
    Qi, Man
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (08): : 1151 - 1156
  • [47] GENETIC ALGORITHMS AND NEURAL NETWORKS - OPTIMIZING CONNECTIONS AND CONNECTIVITY
    WHITLEY, D
    STARKWEATHER, T
    BOGART, C
    PARALLEL COMPUTING, 1990, 14 (03) : 347 - 361
  • [48] Optimizing the reservoir operating rule curves by genetic algorithms
    Chang, FJ
    Chen, L
    Chang, LC
    HYDROLOGICAL PROCESSES, 2005, 19 (11) : 2277 - 2289
  • [49] A system for monitoring and optimizing the milling process with genetic algorithms
    Milfelner, M
    Cus, F
    STROJNISKI VESTNIK-JOURNAL OF MECHANICAL ENGINEERING, 2004, 50 (10): : 446 - 461
  • [50] Use of genetic algorithms for optimizing a decision fusion framework
    Rahman, F
    Alam, H
    Hartono, R
    Fairhurst, M
    FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, 2003, : 831 - 837