Optimizing Document Classification: Unleashing the Power of Genetic Algorithms

被引:7
|
作者
Mustafa, Ghulam [1 ]
Rauf, Abid [2 ]
Al-Shamayleh, Ahmad Sami [3 ]
Sulaiman, Muhammad [4 ]
Alrawagfeh, Wagdi [5 ]
Afzal, Muhammad Tanvir [1 ]
Akhunzada, Adnan
机构
[1] Shifa Tameer Emillat Univ, Dept Comp, Islamabad 44000, Pakistan
[2] Univ Engn & Technol, Dept Comp Sci, Taxila 47080, Pakistan
[3] Ah Ahliyya Amman Univ, Fac Informat Technol, Dept Network & Cybersecur, Amman 19328, Jordan
[4] Univ Stavenger, Dept Comp Sci, N-9990 Stavanger, Norway
[5] Univ Doha Sci & Technol, Coll Comp & IT, Doha, Qatar
关键词
Document classification (DC); Word2Vector (W2V); bag of word (BOW); term frequency (TF); association for computing machinery (ACM); machine learning (ML); TEXT;
D O I
10.1109/ACCESS.2023.3292248
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features such as, title, keyword, abstract, and general terms from the SANTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.
引用
收藏
页码:83136 / 83149
页数:14
相关论文
共 50 条
  • [21] Optimizing a pifa using a genetic algorithms approach
    Kouveliotis, N. K.
    Panagiotou, S. C.
    Varlamos, P. K.
    Dimousios, T. D.
    Capsalis, C. N.
    JOURNAL OF ELECTROMAGNETIC WAVES AND APPLICATIONS, 2008, 22 (2-3) : 453 - 461
  • [22] Optimizing doped libraries by using genetic algorithms
    Dirk Tomandl
    Andreas Schober
    Andreas Schwienhorst
    Journal of Computer-Aided Molecular Design, 1997, 11 : 29 - 38
  • [23] Optimizing Linear Antenna Arrays with Genetic Algorithms
    Valdez, Libis
    Viloria-Nunez, Cesar
    Ripoll-Solano, Lacides
    Guerrero-Granados, Bethsy
    2024 IEEE COLOMBIAN CONFERENCE ON COMMUNICATIONS AND COMPUTING, COLCOM 2024, 2024,
  • [24] Optimizing doped libraries by using genetic algorithms
    J Comput Aided Mol Des, 1 (29):
  • [25] Optimizing doped libraries by using genetic algorithms
    Tomandl, D
    Schober, A
    Schwienhorst, A
    JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 1997, 11 (01) : 29 - 38
  • [26] Optimizing interleaver for turbo codes by genetic algorithms
    Kromer, Pavel
    Snasel, Vaclav
    Platos, Jan
    Ouddane, Nabil
    19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL I, PROCEEDINGS, 2007, : 423 - +
  • [27] Genetic algorithms in optimizing surveillance and maintenance of components
    Munoz, A
    Martorell, S
    Serradell, V
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 1997, 57 (02) : 107 - 120
  • [28] Optimizing Genetic Algorithms Using the Binomial Distribution
    Computer Science, School of Business, Stockton University, 101 Vera King Farris Dr, Galloway
    NJ, United States
    Int. Jt. Conf. Comput. Intell., (159-169):
  • [29] Genetic algorithms for optimizing the remediation of contaminated aquifer
    Gümrah, F
    Erbas, D
    Öz, B
    Altintas, S
    TRANSPORT IN POROUS MEDIA, 2000, 41 (02) : 149 - 171
  • [30] Decision Support System based on genetic algorithms for optimizing the Operation Planning of Hydrothermal Power Systems
    Alencar, T. R.
    Gramulia, J., Jr.
    Otobe, R. F., Jr.
    Asano, P. T. L.
    2015 5TH INTERNATIONAL YOUTH CONFERENCE ON ENERGY (IYCE), 2015,