An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples

被引:0
|
作者
Bose S. [1 ]
Das C. [1 ]
Banerjee A. [1 ]
Ghosh K. [2 ]
Chattopadhyay M. [3 ]
Chattopadhyay S. [4 ]
Barik A. [1 ]
机构
[1] Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal
[2] Machine Intelligence Unit & Center for Soft Computing Research, Indian Statistical Institute, Kolkata, West Bengal
[3] School of Education Technology, Jadavpur University, Kolkata, West Bengal
[4] Department of Information Technology, Jadavpur University, Kolkata, West Bengal
关键词
Attribute clustering; DNA Microarray; Ensemble classifier; Filter; Gene expression data; Machine learning;
D O I
10.7717/PEERJ-CS.671
中图分类号
学科分类号
摘要
Background: Machine learning is one kind of machine intelligence technique that learns from data and detects inherent patterns from large, complex datasets. Due to this capability, machine learning techniques are widely used in medical applications, especially where large-scale genomic and proteomic data are used. Cancer classification based on bio-molecular profiling data is a very important topic for medical applications since it improves the diagnostic accuracy of cancer and techniques are widely used in cancer detection and prognosis. Methods: In this article, a new ensemble machine learning classification model named Multiple Filtering and Supervised Attribute Clustering algorithm based Ensemble Classification model (MFSAC-EC) is proposed which can handle class imbalance problem and high dimensionality of microarray datasets. This model first generates a number of bootstrapped datasets from the original training data where the oversampling procedure is applied to handle the class imbalance problem. The proposed MFSAC method is then applied to each of these bootstrapped datasets sub-datasets, each of which contains a subset of the most relevant/ informative attributes of the original dataset. The MFSAC method is a selection technique combining multiple filters with a new supervised attribute clustering algorithm. Then for every sub-dataset, a base classifier is constructed separately, and finally, the predictive accuracy of these base classifiers is combined using the majority voting technique forming the MFSAC-based ensemble classifier. Also, a number of most informative attributes are selected as important features based on their frequency of occurrence in these sub-datasets. Results: To assess the performance of the proposed MFSAC-EC model, it is applied high-dimensional microarray gene expression datasets for cancer sample classification. The proposed model is compared with well-known existing establish its effectiveness with respect to other models. From the results, it has been found that the generalization performance/testing accuracy of the proposed classifier is significantly better compared to other well-known existing models. Apart from that, it has been also found that the proposed model can identify many important attributes/biomarker genes. Subjects Bioinformatics, Data Mining and Machine Learning © Copyright 2021 Bose et al.
引用
收藏
页码:1 / 40
页数:39
相关论文
共 50 条
  • [21] A robust clustering algorithm based on extreme learning machine
    School of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
    不详
    J. Inf. Comput. Sci., 13 (4951-4958):
  • [22] Human gait model based on a machine learning and filtering noisy signals with recursive algorithm
    Teran P, Diego
    Dominguez, Enrique
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 1142 - 1145
  • [23] Classifying GABAergic interneurons with semi-supervised projected model-based clustering
    Mihaljevic, Bojan
    Benavides-Piccione, Ruth
    Guerra, Luis
    DeFelipe, Javier
    Larranaga, Pedro
    Bielza, Concha
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2015, 65 (01) : 49 - 59
  • [24] Attribute-based supervised deep learning model for action recognition
    Chen, Kai
    Ding, Guiguang
    Han, Jungong
    FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (02) : 219 - 229
  • [25] Improving the Performance of Machine Learning based Face Recognition Algorithm with Multiple Weighted Facial Attribute Sets
    Sakthivel, S.
    Lakshmipathi, R.
    Manikandan, M. A.
    2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 658 - +
  • [26] Attribute-based supervised deep learning model for action recognition
    Kai Chen
    Guiguang Ding
    Jungong Han
    Frontiers of Computer Science, 2017, 11 : 219 - 229
  • [27] Collaborative Filtering Recommendation Model Based on Fuzzy Clustering Algorithm
    Yang, Ye
    Zhang, Yunhua
    6TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2018), 2018, 1967
  • [28] An Ensemble-based Supervised Machine Learning Framework for Android Ransomware Detection
    Sharma, Shweta
    Challa, Rama Krishna
    Kumar, Rakesh
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (3A) : 422 - 429
  • [29] Spam SMS filtering based on text features and supervised machine learning techniques
    Muhammad Adeel Abid
    Saleem Ullah
    Muhammad Abubakar Siddique
    Muhammad Faheem Mushtaq
    Wajdi Aljedaani
    Furqan Rustam
    Multimedia Tools and Applications, 2022, 81 : 39853 - 39871
  • [30] A clustering based ensemble of weighted kernelized extreme learning machine for class imbalance learning
    Choudhary, Roshani
    Shukla, Sanyam
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164