Applications of Feature Selection Techniques on Large Biomedical Datasets

被引:0
|
作者
Ewen, Nicolas [1 ]
Abdou, Tamer [1 ,2 ]
Bener, Ayse [1 ]
机构
[1] Ryerson Univ, Data Sci Lab, Toronto, ON M5B 2K3, Canada
[2] Arish Univ, Fac Sci, North Sinai 45516, Egypt
来源
关键词
Feature selection; Bio-medical; Large dataset;
D O I
10.1007/978-3-030-18305-9_57
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main goal of this paper is to determine the best feature selection algorithm to use on large biomedical datasets. Feature Selection shows a potential role in analyzing large biomedical datasets. Four different feature selection techniques have been employed on large biomedical datasets. These techniques were Information Gain, Chi-Squared, Markov Blanket Discovery, and Recursive Feature Elimination. We measured the efficiency of the selection, the stability of the algorithms, and the quality of the features chosen. Of the four techniques used, the Information Gain and Chi-Squared filters were the most efficient and stable. Both Markov Blanket Discovery and Recursive Feature Elimination took significantly longer to select features, and were less stable. The features selected by Recursive Feature Elimination were of the highest quality, followed by Information Gain and Chi-Squared, and Markov Blanket Discovery placed last. For the purpose of education (e.g. those in the biomedical field teaching data techniques), we recommend Information Gain or Chi-Squared filter. For the purpose of research or analyzing, we recommend one of the filters or Recursive Feature Elimination, depending on the situation. We do not recommend the use of Markov Blanket discovery for the situations used in this trial, keeping in mind that the experiments were not exhaustive.
引用
收藏
页码:543 / 548
页数:6
相关论文
共 50 条
  • [31] Efficient feature selection for logical analysis of large-scale multi-class datasets
    Yan, Kedong
    Miao, Dongjing
    Guo, Cui
    Huang, Chanying
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 2021, 42 (01) : 1 - 23
  • [32] Mobile Visualisation Techniques for Large Datasets
    Lebusa, Motebang
    Thinyane, Hannah
    Sieborger, Ingrid
    2015 IST-AFRICA CONFERENCE, 2015,
  • [33] Increasing Feature Selection Accuracy for L1 Regularized Linear Models in Large Datasets
    Jaiantilal, Abhishek
    Grudic, Gregory
    PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON FEATURE SELECTION IN DATA MINING, 2010, 10 : 86 - 96
  • [34] Controlled feature selection and compressive big data analytics: Applications to biomedical and health studies
    Marino, Simeone
    Xu, Jiachen
    Zhao, Yi
    Zhou, Nina
    Zhou, Yiwang
    Dinov, Ivo D.
    PLOS ONE, 2018, 13 (08):
  • [35] Datasets Meta-Feature Description for Recommending Feature Selection Algorithm
    Filchenkov, Andrey
    Pendryak, Arseniy
    2015 ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE AND INFORMATION EXTRACTION, SOCIAL MEDIA AND WEB SEARCH FRUCT CONFERENCE (AINL-ISMW FRUCT), 2015, : 11 - 18
  • [36] Analysis and semantic querying in large biomedical image datasets
    Kumar, Vijay S.
    Narayanan, Sivaramakrishnan
    Kurc, Tahsin
    Kong, Jun
    Gurcan, Metin N.
    Saltz, Joel H.
    COMPUTER, 2008, 41 (04) : 52 - +
  • [37] Knowledge discovery in medical and biological datasets by integration of Relief-F and correlation feature selection techniques
    Shukla, Alok Kumar
    Pippal, Sanjeev Kumar
    Gupta, Srishti
    Reddy, B. Ramachandra
    Tripathi, Diwakar
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 6637 - 6648
  • [38] Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results
    Chen, Chih-Wen
    Tsai, Yi-Hong
    Chang, Fang-Rong
    Lin, Wei-Chao
    EXPERT SYSTEMS, 2020, 37 (05)
  • [39] Hybrid binary Coral Reefs Optimization algorithm with Simulated Annealing for Feature Selection in high-dimensional biomedical datasets
    Yan, Chaokun
    Ma, Jingjing
    Luo, Huimin
    Patel, Ashutosh
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2019, 184 : 102 - 111
  • [40] Logic classification and feature selection for biomedical data
    Bertolazzi, P.
    Felici, G.
    Festa, P.
    Lancia, G.
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) : 889 - 899