An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples

被引:0
|
作者
Bose S. [1 ]
Das C. [1 ]
Banerjee A. [1 ]
Ghosh K. [2 ]
Chattopadhyay M. [3 ]
Chattopadhyay S. [4 ]
Barik A. [1 ]
机构
[1] Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal
[2] Machine Intelligence Unit & Center for Soft Computing Research, Indian Statistical Institute, Kolkata, West Bengal
[3] School of Education Technology, Jadavpur University, Kolkata, West Bengal
[4] Department of Information Technology, Jadavpur University, Kolkata, West Bengal
关键词
Attribute clustering; DNA Microarray; Ensemble classifier; Filter; Gene expression data; Machine learning;
D O I
10.7717/PEERJ-CS.671
中图分类号
学科分类号
摘要
Background: Machine learning is one kind of machine intelligence technique that learns from data and detects inherent patterns from large, complex datasets. Due to this capability, machine learning techniques are widely used in medical applications, especially where large-scale genomic and proteomic data are used. Cancer classification based on bio-molecular profiling data is a very important topic for medical applications since it improves the diagnostic accuracy of cancer and techniques are widely used in cancer detection and prognosis. Methods: In this article, a new ensemble machine learning classification model named Multiple Filtering and Supervised Attribute Clustering algorithm based Ensemble Classification model (MFSAC-EC) is proposed which can handle class imbalance problem and high dimensionality of microarray datasets. This model first generates a number of bootstrapped datasets from the original training data where the oversampling procedure is applied to handle the class imbalance problem. The proposed MFSAC method is then applied to each of these bootstrapped datasets sub-datasets, each of which contains a subset of the most relevant/ informative attributes of the original dataset. The MFSAC method is a selection technique combining multiple filters with a new supervised attribute clustering algorithm. Then for every sub-dataset, a base classifier is constructed separately, and finally, the predictive accuracy of these base classifiers is combined using the majority voting technique forming the MFSAC-based ensemble classifier. Also, a number of most informative attributes are selected as important features based on their frequency of occurrence in these sub-datasets. Results: To assess the performance of the proposed MFSAC-EC model, it is applied high-dimensional microarray gene expression datasets for cancer sample classification. The proposed model is compared with well-known existing establish its effectiveness with respect to other models. From the results, it has been found that the generalization performance/testing accuracy of the proposed classifier is significantly better compared to other well-known existing models. Apart from that, it has been also found that the proposed model can identify many important attributes/biomarker genes. Subjects Bioinformatics, Data Mining and Machine Learning © Copyright 2021 Bose et al.
引用
收藏
页码:1 / 40
页数:39
相关论文
共 50 条
  • [41] Design Ensemble Machine Learning Model for Breast Cancer Diagnosis
    Hsieh, Sheau-Ling
    Hsieh, Sung-Huai
    Cheng, Po-Hsun
    Chen, Chi-Huang
    Hsu, Kai-Ping
    Lee, I-Shun
    Wang, Zhenyu
    Lai, Feipei
    JOURNAL OF MEDICAL SYSTEMS, 2012, 36 (05) : 2841 - 2847
  • [42] A new approach of clustering based machine-learning algorithm
    Al-Omary, Alauddin Yousif
    Jamil, Mohammad Shahid
    KNOWLEDGE-BASED SYSTEMS, 2006, 19 (04) : 248 - 258
  • [43] Multiple classification algorithm based on ensemble learning for intrusion detection
    Liu, Fulai
    Yue, Jiaqi
    Hu, Zhongyi
    Du, Ruiyan
    WIRELESS NETWORKS, 2025, 31 (03) : 2143 - 2154
  • [44] Ensemble-Based Machine Learning Algorithms for Classifying Breast Tissue Based on Electrical Impedance Spectroscopy
    Rahman, Sam Matiur
    Ali, Md Asraf
    Altwijri, Omar
    Alqahtani, Mahdi
    Ahmed, Nasim
    Ahamed, Nizam U.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, SOFTWARE AND SYSTEMS ENGINEERING, 2020, 965 : 260 - 266
  • [45] Fitting Multiple Machine Learning Models With Performance Based Clustering
    Lorasdagi, Mehmet E.
    Koc, Ahmet B.
    Koc, Ali T.
    Kozat, Suleyman S.
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 816 - 820
  • [46] Temporal Event Detection Using Supervised Machine Learning Based Algorithm
    Bansal, Rakshita
    Rani, Monika
    Kumar, Harish
    Kaushal, Sakshi
    INNOVATIONS IN BIO-INSPIRED COMPUTING AND APPLICATIONS, 2019, 939 : 257 - 268
  • [47] LEACH Based WSN Classification Using Supervised Machine Learning Algorithm
    Mustary, Shabnom
    Abul Kashem, Mohammod
    Khan, Nurul Islam
    Jewel, Faruq Ahmed
    Islam, Monirul
    Islam, Saiful
    2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [48] K-means clustering algorithm based on semi-supervised learning
    Department of Mathematics and Computer, Shangrao Normal College, Shangrao 334001, China
    不详
    J. Comput. Inf. Syst., 2008, 5 (2007-2013):
  • [49] Classifying Cancer Patients Based on DNA Sequences Using Machine Learning
    Hussain, Fahad
    Saeed, Umair
    Muhammad, Ghulam
    Islam, Noman
    Sheikh, Ghazala Shafi
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2019, 9 (03) : 436 - 443
  • [50] HYPERSPECTRAL SUPERVISED CLASSIFICATION USING MEAN FILTERING BASED KERNEL EXTREME LEARNING MACHINE
    Shang, Wenting
    Wu, Zebin
    Xu, Yang
    Zhang, Yan
    Wei, Zhihui
    2018 FIFTH INTERNATIONAL WORKSHOP ON EARTH OBSERVATION AND REMOTE SENSING APPLICATIONS (EORSA), 2018, : 476 - 479