Applications of Feature Selection Techniques on Large Biomedical Datasets

被引:0
|
作者
Ewen, Nicolas [1 ]
Abdou, Tamer [1 ,2 ]
Bener, Ayse [1 ]
机构
[1] Ryerson Univ, Data Sci Lab, Toronto, ON M5B 2K3, Canada
[2] Arish Univ, Fac Sci, North Sinai 45516, Egypt
来源
关键词
Feature selection; Bio-medical; Large dataset;
D O I
10.1007/978-3-030-18305-9_57
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main goal of this paper is to determine the best feature selection algorithm to use on large biomedical datasets. Feature Selection shows a potential role in analyzing large biomedical datasets. Four different feature selection techniques have been employed on large biomedical datasets. These techniques were Information Gain, Chi-Squared, Markov Blanket Discovery, and Recursive Feature Elimination. We measured the efficiency of the selection, the stability of the algorithms, and the quality of the features chosen. Of the four techniques used, the Information Gain and Chi-Squared filters were the most efficient and stable. Both Markov Blanket Discovery and Recursive Feature Elimination took significantly longer to select features, and were less stable. The features selected by Recursive Feature Elimination were of the highest quality, followed by Information Gain and Chi-Squared, and Markov Blanket Discovery placed last. For the purpose of education (e.g. those in the biomedical field teaching data techniques), we recommend Information Gain or Chi-Squared filter. For the purpose of research or analyzing, we recommend one of the filters or Recursive Feature Elimination, depending on the situation. We do not recommend the use of Markov Blanket discovery for the situations used in this trial, keeping in mind that the experiments were not exhaustive.
引用
收藏
页码:543 / 548
页数:6
相关论文
共 50 条
  • [21] Optimized Parameter Search for Large Datasets of the Regularization Parameter and Feature Selection for Ridge Regression
    Pieter Buteneers
    Ken Caluwaerts
    Joni Dambre
    David Verstraeten
    Benjamin Schrauwen
    Neural Processing Letters, 2013, 38 : 403 - 416
  • [22] Optimized Parameter Search for Large Datasets of the Regularization Parameter and Feature Selection for Ridge Regression
    Buteneers, Pieter
    Caluwaerts, Ken
    Dambre, Joni
    Verstraeten, David
    Schrauwen, Benjamin
    NEURAL PROCESSING LETTERS, 2013, 38 (03) : 403 - 416
  • [23] Supervised Variational Relevance Learning, An Analytic Geometric Feature Selection with Applications to Omic Datasets
    Boareto, Marcelo
    Cesar, Jonatas
    Leite, Vitor B. P.
    Caticha, Nestor
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (03) : 705 - 711
  • [24] Comparison of Feature Selection Techniques Using Fully-Controlled Simulation-Based Datasets
    Arslanturk, Suzan
    Siadat, Mohammad
    Ogunyemi, Theophilus
    Sethi, Ishwar
    Diokno, Ananias
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND EVALUATION, 2011, : 18 - 23
  • [25] Multitask feature selection within structural datasets
    Bee, Sarah
    Poole, Jack
    Worden, Keith
    Dervilis, Nikolaos
    Bull, Lawrence
    DATA-CENTRIC ENGINEERING, 2024, 5
  • [26] FEATURE SELECTION FOR DATASETS WITH IMBALANCED CLASS DISTRIBUTIONS
    Kamal, Abu H. M.
    Zhu, Xingquan
    Pandya, Abhijit
    Hsu, Sam
    Narayanan, Ramaswamy
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2010, 20 (02) : 113 - 137
  • [27] Efficient feature selection for logical analysis of large-scale multi-class datasets
    Kedong Yan
    Dongjing Miao
    Cui Guo
    Chanying Huang
    Journal of Combinatorial Optimization, 2021, 42 : 1 - 23
  • [28] Optimizing the Feature Selection Process for Better Accuracy in Datasets with a Large Number of Features (Student Abstract)
    Chen, Xi
    Doryab, Afsaneh
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13767 - 13768
  • [29] Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets
    Zdravevski, Eftim
    Lameski, Petre
    Kulakov, Andrea
    Gjorgjevikj, Dejan
    FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2014, 2014, 2 : 387 - 394
  • [30] Robust Feature Selection Using Ensemble Feature Selection Techniques
    Saeys, Yvan
    Abeel, Thomas
    Van de Peer, Yves
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 313 - +