Optimizing Document Classification: Unleashing the Power of Genetic Algorithms

被引:7
|
作者
Mustafa, Ghulam [1 ]
Rauf, Abid [2 ]
Al-Shamayleh, Ahmad Sami [3 ]
Sulaiman, Muhammad [4 ]
Alrawagfeh, Wagdi [5 ]
Afzal, Muhammad Tanvir [1 ]
Akhunzada, Adnan
机构
[1] Shifa Tameer Emillat Univ, Dept Comp, Islamabad 44000, Pakistan
[2] Univ Engn & Technol, Dept Comp Sci, Taxila 47080, Pakistan
[3] Ah Ahliyya Amman Univ, Fac Informat Technol, Dept Network & Cybersecur, Amman 19328, Jordan
[4] Univ Stavenger, Dept Comp Sci, N-9990 Stavanger, Norway
[5] Univ Doha Sci & Technol, Coll Comp & IT, Doha, Qatar
关键词
Document classification (DC); Word2Vector (W2V); bag of word (BOW); term frequency (TF); association for computing machinery (ACM); machine learning (ML); TEXT;
D O I
10.1109/ACCESS.2023.3292248
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many individuals, including researchers, professors, and students, encounter difficulties when searching for scholarly documents, papers, and journals within a specific domain. Consequently, scholars have begun to focus on document classification problem, offering various methods to address this issue. Researchers have utilized diverse data sources, such as citations, metadata, content, and hybrids, in their approaches.In these sources, the meta-data-based approach stands out for research paper classification due to its availability at no cost. Various scholars have employed different metadata parameters of research articles, including the title, abstract, keywords, and general terms, for research paper classification. In this study, we chose four meta-data-based features such as, title, keyword, abstract, and general terms from the SANTOS dataset, which was prepared by ACM. To represent these features numerically, we employed a semantic-based model called BERT instead of the commonly used count-based models. BERT generates a 768-dimensional vector for each record, which introduces significant time complexity during computation. Additionally, our proposed model optimizes the features using a genetic algorithm. Optimal feature selection performances a crucial role in this domain, enhancing the overall accuracy of the document classification system while reducing the time complexity associated with selecting the most relevant features from this large-dimensional space. For classification purposes, we employed GNB and SVM classifiers. The outcomes of our study exposed that the combination of title and keywords outperformed other combinations.
引用
收藏
页码:83136 / 83149
页数:14
相关论文
共 50 条
  • [1] Optimizing Twins Decision Tree Classification, Using Genetic Algorithms
    Seifi, Farid
    Ahmadi, Hamed
    Kangavari, Mohammad Reza
    Lotfi, Ehsan
    Imaniyan, Sanaz
    Lagzian, Somayeh
    PROCEEDINGS OF THE 2008 7TH IEEE INTERNATIONAL CONFERENCE ON CYBERNETIC INTELLIGENT SYSTEMS, 2008, : 311 - +
  • [2] Document categorization by genetic algorithms
    Liu, CH
    Lu, CC
    Lee, WP
    SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 3868 - 3872
  • [3] On optimizing power and crosstalk for bus coupling capacitance using genetic algorithms
    Naroska, E
    Ruan, SJ
    Lai, FP
    Schwiegelshohn, U
    Liu, LC
    PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL V: BIO-MEDICAL CIRCUITS & SYSTEMS, VLSI SYSTEMS & APPLICATIONS, NEURAL NETWORKS & SYSTEMS, 2003, : 277 - 280
  • [4] Optimizing preventive maintenance in a nuclear power plant using genetic algorithms
    Podgorelec, V
    Kokol, P
    Kunej, A
    COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION - EVOLUTIONARY COMPUTATION & FUZZY LOGIC FOR INTELLIGENT CONTROL, KNOWLEDGE ACQUISITION & INFORMATION RETRIEVAL, 1999, 55 : 17 - 22
  • [5] Optimizing sorting with genetic algorithms
    Li, XM
    Garzarán, MJ
    Padua, D
    CGO 2005: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2005, : 99 - 110
  • [6] Optimizing of BP Neural Network Based on Genetic Algorithms in Power Load Forecasting
    Wang, Yongli
    Niu, Dongxiao
    Lee, Vincent C. S.
    IECON 2011: 37TH ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2011, : 4322 - 4327
  • [7] Optimizing readability using genetic algorithms
    Martinez-Gil, Jorge
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [8] GENETIC ALGORITHMS OPTIMIZING EVALUATION FUNCTIONS
    TUNSTALLPEDOE, W
    ICCA JOURNAL, 1991, 14 (03): : 119 - 128
  • [9] Temporally-Aware Algorithms for Document Classification
    Salles, Thiago
    Rocha, Leonardo
    Pappa, Gisele L.
    Mourao, Fernando
    Meira, Wagner, Jr.
    Goncalves, Marcos
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 307 - 314