Ranked MSD: A New Feature Ranking and Feature Selection Approach for Biomarker Identification

被引：1

作者：

Verma, Ghanshyam ^{[1
,2
]}

Jha, Alokkumar ^{[1
,2
]}

Rebholz-Schuhmann, Dietrich ^{[3
]}

Madden, Michael G. ^{[1
,2
]}

机构：

[1] Natl Univ Ireland Galway, Insight Ctr Data Analyt, Galway, Ireland

[2] Natl Univ Ireland Galway, Sch Comp Sci, Galway, Ireland

[3] Univ Cologne, ZB Med Informat Ctr Life Sci, Cologne, Germany

来源：

MACHINE LEARNING AND KNOWLEDGE EXTRACTION, CD-MAKE 2019 | 2019年 / 11713卷

基金：

爱尔兰科学基金会;

关键词：

Machine learning; Respiratory viral infection; Feature ranking; Feature selection; Classification; Explainable AI; SUPPORT VECTOR MACHINES; SIGNATURE;

D O I：

10.1007/978-3-030-29726-8_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the era of big data when a huge amount of data is continuously being generated, it is common for situations to arise where the number of samples is much smaller than the number of features (variables) per sample. This phenomenon is often found in biomedical domains, where we may have relatively few patients, compared to the amount of data per patient. For example, gene expression data typically has between 10,000 and 60,000 features per sample. A separate issue arises from the "right to explanation" found in the European General Data Protection Regulation (GDPR), which may prevent the use of black-box models in applications where explainability is required. In such situations, there is a need for robust algorithms which can identify the relevant features from experimental data by discarding irrelevant ones, yielding a simpler subset that facilitates explanation. To address these needs, we have developed a new algorithm for feature ranking and feature selection, named Ranked MSD. We have tested our proposed approach on two real-world gene expression data sets, both of which relate to respiratory viral infections. This Ranked MSD feature selection algorithm is able to reduce the feature set size from 12,023 genes (features) to 65 genes on the first data set and from 20,737 genes to 31 genes on the second data set, in both cases without any significant loss in disease prediction accuracy. In an alternative configuration, our proposed algorithm is able to identify a small subset of features that gives better accuracy than that of the full feature set. Our proposed algorithm can also identify important biomarkers (genes) with their importance score for a particular disease and the identified top-ranked biomarkers can play a vital role in drug discovery and precision medicine.

引用

页码：147 / 167

页数：21

共 50 条

[31] A new approach to feature selection for text categorization
Li, SS
Zong, CQ
PROCEEDINGS OF THE 2005 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (IEEE NLP-KE'05), 2005, : 626 - 630
[32] Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
Abeel, Thomas
Helleputte, Thibault
Van de Peer, Yves
Dupont, Pierre
Saeys, Yvan
BIOINFORMATICS, 2010, 26 (03) : 392 - 398
[33] Unsupervised feature selection for biomarker identification in chromatography and gene expression data
Strickert, Marc
Sreenivasulu, Nese
Peterek, Silke
Weschke, Winfriede
Mock, Hans-Peter
Seiffert, Udo
ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2006, 4087 : 274 - 285
[34] A new ranking-based stability measure for feature selection algorithms
Rakesh, Deepak Kumar
Anwit, Raj
Jana, Prasanta K.
SOFT COMPUTING, 2023, 27 (09) : 5377 - 5396
[35] A new ranking-based stability measure for feature selection algorithms
Deepak Kumar Rakesh
Raj Anwit
Prasanta K. Jana
Soft Computing, 2023, 27 : 5377 - 5396
[36] Stable feature selection for biomarker discovery
He, Zengyou
Yu, Weichuan
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (04) : 215 - 225
[37] UNSUPERVISED FEATURE RANKING AND SELECTION BASED ON AUTOENCODERS
Sharifipour, Sasan
Fayyazi, Hossein
Sabokrou, Mohammad
Adeli, Ehsan
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3172 - 3176
[38] Heuristic search over a ranking for feature selection
Ruiz, R
Riquelme, JC
Aguilar-Ruiz, JS
COMPUTATIONAL INTELLIGENCE AND BIOINSPIRED SYSTEMS, PROCEEDINGS, 2005, 3512 : 742 - 749
[39] Neighborhood Ranking-Based Feature Selection
Ipkovich, Adam
Abonyi, Janos
IEEE ACCESS, 2024, 12 : 20152 - 20168
[40] A new unsupervised fuzzy feature ranking measure for feature evaluation
Foroutan, Farzane
Eftekhari, Mahdi
2013 13TH IRANIAN CONFERENCE ON FUZZY SYSTEMS (IFSC), 2013,

← 1 2 3 4 5 →