Applications of Feature Selection Techniques on Large Biomedical Datasets

被引:0
|
作者
Ewen, Nicolas [1 ]
Abdou, Tamer [1 ,2 ]
Bener, Ayse [1 ]
机构
[1] Ryerson Univ, Data Sci Lab, Toronto, ON M5B 2K3, Canada
[2] Arish Univ, Fac Sci, North Sinai 45516, Egypt
来源
关键词
Feature selection; Bio-medical; Large dataset;
D O I
10.1007/978-3-030-18305-9_57
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main goal of this paper is to determine the best feature selection algorithm to use on large biomedical datasets. Feature Selection shows a potential role in analyzing large biomedical datasets. Four different feature selection techniques have been employed on large biomedical datasets. These techniques were Information Gain, Chi-Squared, Markov Blanket Discovery, and Recursive Feature Elimination. We measured the efficiency of the selection, the stability of the algorithms, and the quality of the features chosen. Of the four techniques used, the Information Gain and Chi-Squared filters were the most efficient and stable. Both Markov Blanket Discovery and Recursive Feature Elimination took significantly longer to select features, and were less stable. The features selected by Recursive Feature Elimination were of the highest quality, followed by Information Gain and Chi-Squared, and Markov Blanket Discovery placed last. For the purpose of education (e.g. those in the biomedical field teaching data techniques), we recommend Information Gain or Chi-Squared filter. For the purpose of research or analyzing, we recommend one of the filters or Recursive Feature Elimination, depending on the situation. We do not recommend the use of Markov Blanket discovery for the situations used in this trial, keeping in mind that the experiments were not exhaustive.
引用
收藏
页码:543 / 548
页数:6
相关论文
共 50 条
  • [1] A contemporary feature selection and classification framework for imbalanced biomedical datasets
    Bikku, Thulasi
    Nandam, Sambasiva Rao
    Akepogu, Ananda Rao
    EGYPTIAN INFORMATICS JOURNAL, 2018, 19 (03) : 191 - 198
  • [2] Evaluation of feature subset selection, feature weighting, and prototype selection for biomedical applications
    Little, Suzanne
    Salvetti, Ovidio
    Perner, Petra
    ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2008, 5239 : 312 - 324
  • [3] A biobjective feature selection algorithm for large omics datasets
    Cavique, Luis
    Mendes, Armando B.
    Martiniano, Hugo F. M. C.
    Correia, Luis
    EXPERT SYSTEMS, 2018, 35 (04)
  • [4] Feature Selection: Binary Harris Hawk Optimizer Based Biomedical Datasets
    Ibrahim, Hadeel Tariq
    Mazher, Wamidh Jalil
    Jassim, Enas Mahmood
    INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2022, 25 (70): : 33 - 49
  • [5] Measuring Robustness of Feature Selection Techniques on Software Engineering Datasets
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Wald, Randall
    2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, : 309 - 314
  • [6] An experimental comparison of feature selection methods on two-class biomedical datasets
    Drotar, P.
    Gazda, J.
    Smekal, Z.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 66 : 1 - 10
  • [7] Stability of Feature Selection Algorithms and its Influence on Prediction Accuracy in Biomedical Datasets
    Drotar, Peter
    Smekal, Zdenek
    TENCON 2014 - 2014 IEEE REGION 10 CONFERENCE, 2014,
  • [8] Extreme Large Margin Distribution Machine and Its Applications for Biomedical Datasets
    Yang, Zhiyong
    Lu, Jingcheng
    Zhang, Taohong
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1549 - 1554
  • [9] Consistent Matrix: A Feature Selection Framework for Large-Scale Datasets
    Yang, Tian
    Li, Yuan-Jiang
    Qian, Yuhua
    Wang, Fei-Yue
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (11) : 4024 - 4038
  • [10] A Literature Review of Feature Selection Techniques and Applications Review of feature selection in data mining
    Visalakshi, S.
    Radha, V.
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC), 2014, : 966 - 971