Protein Sequence-Based COVID-19 Detection: A Comparative Study of Machine Learning Classification Methods

被引:0
|
作者
Aminah, Siti [1 ]
Ardaneswari, Gianinna [1 ]
Awang, Mohd Khalid [2 ]
Yusaputra, Muhammad Ariq [1 ]
Sari, Dian Puspita [1 ]
机构
[1] Univ Indonesia, Fac Math & Nat Sci, Dept Math, Depok 16424, Indonesia
[2] Univ Sultan Zainal Abidin, Fac Informat & Comp, Besut 22200, Terengganu, Malaysia
关键词
Compendex;
D O I
10.1155/2024/8683822
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Coronaviruses, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), continue to pose a significant public health challenge globally, even in 2024. Despite advancements in vaccines and treatments, the accurate classification of coronavirus protein sequences remains crucial for monitoring variants, understanding viral behavior, and developing targeted interventions. In this study, we investigate the efficacy of various classification methods in accurately classifying coronavirus protein sequences. We explore the use of K-nearest neighbor (KNN), fuzzy KNN (FKNN), support vector machine (SVM), and SVM with particle swarm optimization (PSO-SVM) algorithms for classification, complemented by feature selection techniques including principal component analysis (PCA) and random forest-recursive feature elimination (RF-RFE). Our dataset comprises 2000 protein sequences, evenly split between SARS-CoV-2 and non-SARS-CoV-2 sequences. Through rigorous analysis, we evaluate the performance of each classification model in terms of accuracy, sensitivity, specificity, and receiver operating characteristic area under the curve (ROC-AUC). Our findings demonstrate consistently high performance across all models, reflecting their efficacy in classifying coronavirus protein sequences. Notably, the PCA + PSO-SVM model emerges as the top-performing model, exhibiting the highest classification accuracy, specificity, and ROC-AUC score, demonstrating its effectiveness in distinguishing between SARS-CoV-2 and non-SARS-CoV-2 sequences. Overall, our study highlights the importance of employing advanced classification methods and feature selection techniques in accurately classifying coronavirus protein sequences. The findings provide valuable insights for researchers and practitioners in the field of bioinformatics and contribute to ongoing efforts in understanding and combating the COVID-19 pandemic and its evolving challenges.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A Comparative Study of COVID-19 Detection Using Deep and Machine Learning Methods
    Sheneamer, Abdullah
    Farahat, Hanan
    Hamdi, Ebtehal
    Qahtani, Mona
    Alkhairat, Bashyir
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (03): : 738 - 745
  • [2] Detection of COVID-19 Using Protein Sequence Data via Machine Learning Classification Approach
    Aminah, Siti
    Ardaneswari, Gianinna
    Husnah, Mufarrido
    Deori, Ghani
    Prasetyo, Handi Bagus
    JOURNAL OF APPLIED MATHEMATICS, 2023, 2023
  • [3] A Comparative Study of Protein Sequences Classification-Based Machine Learning Methods for COVID-19 Virus against HIV-1
    Afify, Heba M.
    Zanaty, Muhammad S.
    APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1733 - 1745
  • [4] A comparative study of federated learning methods for COVID-19 detection
    Darzi, Erfan
    Sijtsema, Nanna M.
    van Ooijen, P. M. A.
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [5] A comparative study of federated learning methods for COVID-19 detection
    Erfan Darzi
    Nanna M. Sijtsema
    P. M. A. van Ooijen
    Scientific Reports, 14
  • [6] Comparative study of machine learning methods for COVID-19 transmission forecasting
    Dairi, Abdelkader
    Harrou, Fouzi
    Zeroual, Abdelhafid
    Hittawe, Mohamad Mazen
    Sun, Ying
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 118
  • [7] COVID-19 detection and classification for machine learning methods using human genomic data
    Ahemad M.T.
    Hameed M.A.
    Vankdothu R.
    Measurement: Sensors, 2022, 24
  • [8] Benchmarking machine learning robustness in Covid-19 genome sequence classification
    Ali, Sarwan
    Sahoo, Bikram
    Zelikovsky, Alexander
    Chen, Pin-Yu
    Patterson, Murray
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [9] Benchmarking machine learning robustness in Covid-19 genome sequence classification
    Sarwan Ali
    Bikram Sahoo
    Alexander Zelikovsky
    Pin-Yu Chen
    Murray Patterson
    Scientific Reports, 13
  • [10] A Comparative Study of Classification Methods on the States of the USA Based on COVID-19 Indicators
    Eliguzel, Ibrahim Mirac
    Ozceylan, Eren
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS: ARTIFICIAL INTELLIGENCE FOR SUSTAINABLE AND RESILIENT PRODUCTION SYSTEMS (APMS 2021), PT III, 2021, 632 : 582 - 590