An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples

被引:0
|
作者
Bose S. [1 ]
Das C. [1 ]
Banerjee A. [1 ]
Ghosh K. [2 ]
Chattopadhyay M. [3 ]
Chattopadhyay S. [4 ]
Barik A. [1 ]
机构
[1] Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal
[2] Machine Intelligence Unit & Center for Soft Computing Research, Indian Statistical Institute, Kolkata, West Bengal
[3] School of Education Technology, Jadavpur University, Kolkata, West Bengal
[4] Department of Information Technology, Jadavpur University, Kolkata, West Bengal
关键词
Attribute clustering; DNA Microarray; Ensemble classifier; Filter; Gene expression data; Machine learning;
D O I
10.7717/PEERJ-CS.671
中图分类号
学科分类号
摘要
Background: Machine learning is one kind of machine intelligence technique that learns from data and detects inherent patterns from large, complex datasets. Due to this capability, machine learning techniques are widely used in medical applications, especially where large-scale genomic and proteomic data are used. Cancer classification based on bio-molecular profiling data is a very important topic for medical applications since it improves the diagnostic accuracy of cancer and techniques are widely used in cancer detection and prognosis. Methods: In this article, a new ensemble machine learning classification model named Multiple Filtering and Supervised Attribute Clustering algorithm based Ensemble Classification model (MFSAC-EC) is proposed which can handle class imbalance problem and high dimensionality of microarray datasets. This model first generates a number of bootstrapped datasets from the original training data where the oversampling procedure is applied to handle the class imbalance problem. The proposed MFSAC method is then applied to each of these bootstrapped datasets sub-datasets, each of which contains a subset of the most relevant/ informative attributes of the original dataset. The MFSAC method is a selection technique combining multiple filters with a new supervised attribute clustering algorithm. Then for every sub-dataset, a base classifier is constructed separately, and finally, the predictive accuracy of these base classifiers is combined using the majority voting technique forming the MFSAC-based ensemble classifier. Also, a number of most informative attributes are selected as important features based on their frequency of occurrence in these sub-datasets. Results: To assess the performance of the proposed MFSAC-EC model, it is applied high-dimensional microarray gene expression datasets for cancer sample classification. The proposed model is compared with well-known existing establish its effectiveness with respect to other models. From the results, it has been found that the generalization performance/testing accuracy of the proposed classifier is significantly better compared to other well-known existing models. Apart from that, it has been also found that the proposed model can identify many important attributes/biomarker genes. Subjects Bioinformatics, Data Mining and Machine Learning © Copyright 2021 Bose et al.
引用
收藏
页码:1 / 40
页数:39
相关论文
共 50 条
  • [31] Spam SMS filtering based on text features and supervised machine learning techniques
    Abid, Muhammad Adeel
    Ullah, Saleem
    Siddique, Muhammad Abubakar
    Mushtaq, Muhammad Faheem
    Aljedaani, Wajdi
    Rustam, Furqan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 39853 - 39871
  • [32] A Genetic Algorithm Based Clustering Ensemble Approach to Learning Relational Databases
    Alfred, Rayner
    Chiye, Gabriel Jong
    Obit, Joe Henry
    Hijazi, Mohd Hanafi Ahmad
    Chin, Kim On
    Lau, HuiKeng
    ADVANCED SCIENCE LETTERS, 2015, 21 (10) : 3313 - 3317
  • [33] Machine learning algorithms for simultaneous supervised detection of peaks in multiple samples and cell types
    Hocking, Toby Dylan
    Bourque, Guillaume
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2020, 2020, : 367 - 378
  • [34] Building a Classification Model based on Feature Engineering for the Prediction of Wine Quality by Employing Supervised Machine Learning and Ensemble Learning Techniques
    Nandan, Mauparna
    Gupta, Harsh Raj
    Mondal, Moutusi
    2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
  • [35] Persistent Homology-Based Machine Learning Method for Filtering and Classifying Mammographic Microcalcification Images in Early Cancer Detection
    Malek, Aminah Abdul
    Alias, Mohd Almie
    Razak, Fatimah Abdul
    Noorani, Mohd Salmi Md
    Mahmud, Rozi
    Zulkepli, Nur Fariha Syaqina
    CANCERS, 2023, 15 (09)
  • [36] Classification Risk-Based Semi-supervised Ensemble Learning Algorithm
    He Y.
    Zhu P.
    Huang Z.
    Philippe F.-V.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2024, 37 (04): : 339 - 351
  • [37] Machine-Deep-Ensemble Learning Model for Classifying Cybersickness Caused by Virtual Reality Immersion
    Oh, SeungJun
    Kim, Dong-Keun
    CYBERPSYCHOLOGY BEHAVIOR AND SOCIAL NETWORKING, 2021, 24 (11) : 729 - 736
  • [38] A Neural Learning-Based Clustering Model for Collaborative Filtering
    Mika, Grzegorz P.
    Dziczkowski, Grzegorz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT I, 2018, 11055 : 219 - 227
  • [39] Clustering based semi-supervised machine learning for DDoS attack classification
    Misbahuddin, Mohammad
    Zaidi, Syed Mustafa Ali
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (04) : 436 - 446
  • [40] Design Ensemble Machine Learning Model for Breast Cancer Diagnosis
    Sheau-Ling Hsieh
    Sung-Huai Hsieh
    Po-Hsun Cheng
    Chi-Huang Chen
    Kai-Ping Hsu
    I-Shun Lee
    Zhenyu Wang
    Feipei Lai
    Journal of Medical Systems, 2012, 36 : 2841 - 2847