Ranked MSD: A New Feature Ranking and Feature Selection Approach for Biomarker Identification

被引:1
|
作者
Verma, Ghanshyam [1 ,2 ]
Jha, Alokkumar [1 ,2 ]
Rebholz-Schuhmann, Dietrich [3 ]
Madden, Michael G. [1 ,2 ]
机构
[1] Natl Univ Ireland Galway, Insight Ctr Data Analyt, Galway, Ireland
[2] Natl Univ Ireland Galway, Sch Comp Sci, Galway, Ireland
[3] Univ Cologne, ZB Med Informat Ctr Life Sci, Cologne, Germany
来源
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, CD-MAKE 2019 | 2019年 / 11713卷
基金
爱尔兰科学基金会;
关键词
Machine learning; Respiratory viral infection; Feature ranking; Feature selection; Classification; Explainable AI; SUPPORT VECTOR MACHINES; SIGNATURE;
D O I
10.1007/978-3-030-29726-8_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the era of big data when a huge amount of data is continuously being generated, it is common for situations to arise where the number of samples is much smaller than the number of features (variables) per sample. This phenomenon is often found in biomedical domains, where we may have relatively few patients, compared to the amount of data per patient. For example, gene expression data typically has between 10,000 and 60,000 features per sample. A separate issue arises from the "right to explanation" found in the European General Data Protection Regulation (GDPR), which may prevent the use of black-box models in applications where explainability is required. In such situations, there is a need for robust algorithms which can identify the relevant features from experimental data by discarding irrelevant ones, yielding a simpler subset that facilitates explanation. To address these needs, we have developed a new algorithm for feature ranking and feature selection, named Ranked MSD. We have tested our proposed approach on two real-world gene expression data sets, both of which relate to respiratory viral infections. This Ranked MSD feature selection algorithm is able to reduce the feature set size from 12,023 genes (features) to 65 genes on the first data set and from 20,737 genes to 31 genes on the second data set, in both cases without any significant loss in disease prediction accuracy. In an alternative configuration, our proposed algorithm is able to identify a small subset of features that gives better accuracy than that of the full feature set. Our proposed algorithm can also identify important biomarkers (genes) with their importance score for a particular disease and the identified top-ranked biomarkers can play a vital role in drug discovery and precision medicine.
引用
收藏
页码:147 / 167
页数:21
相关论文
共 50 条
  • [21] An unsupervised feature selection algorithm with feature ranking for maximizing performance of the classifiers
    Singh D.A.A.G.
    Balamurugan S.A.A.
    Leavline E.J.
    International Journal of Automation and Computing, 2015, 12 (05) : 511 - 517
  • [22] An Unsupervised Feature Selection Algorithm with Feature Ranking for Maximizing Performance of the Classifiers
    Danasingh Asir Antony Gnana Singh
    Subramanian Appavu Alias Balamurugan
    Epiphany Jebamalar Leavline
    International Journal of Automation and Computing, 2015, 12 (05) : 511 - 517
  • [23] Metric and Accuracy Ranked Feature Inclusion: Hybrids of Filter and Wrapper Feature Selection Approaches
    Thejas, G. S.
    Garg, Rameshwar
    Iyengar, S. S.
    Sunitha, N. R.
    Badrinath, Prajwal
    Chennupati, Shasank
    IEEE ACCESS, 2021, 9 : 128687 - 128701
  • [24] Feature selection for splice site prediction: A new method using EDA-based feature ranking
    Yvan Saeys
    Sven Degroeve
    Dirk Aeyels
    Pierre Rouzé
    Yves Van de Peer
    BMC Bioinformatics, 5
  • [25] A new representation in genetic programming with hybrid feature ranking criterion for high-dimensional feature selection
    Li, Jiayi
    Zhang, Fan
    Ma, Jianbin
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [26] A New GP-based Wrapper Feature Construction Approach to Classification and Biomarker Identification
    Ahmed, Soha
    Zhang, Mengjie
    Peng, Lifeng
    2014 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2014, : 2756 - 2763
  • [27] An unsupervised feature selection approach for actionable warning identification
    Ge, Xiuting
    Fang, Chunrong
    Liu, Jia
    Qing, Mingshuang
    Li, Xuanye
    Zhao, Zhihong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 227
  • [28] A new approach to feature selection in text classification
    Wang, Y
    Wang, XJ
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3814 - 3819
  • [29] AIFSA: A New Approach for Feature Selection and Weighting
    Fouad, Walid
    Badr, Amr
    Farag, Ibrahim
    INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 596 - 609
  • [30] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    WuhanUniversityJournalofNaturalSciences, 2006, (05) : 1335 - 1339