G-Forest: An ensemble method for cost-sensitive feature selection in gene expression microarrays

被引:28
|
作者
Abdulla, Mai [1 ,2 ]
Khasawneh, Mohammad T. [1 ]
机构
[1] SUNY Binghamton, Dept Syst Sci & Ind Engn, Binghamton, NY 13902 USA
[2] 9301 Avondale RD NE,Apt K1060, Redmond, WA 98052 USA
关键词
Feature selection; Cost-sensitive; Genetic algorithm; Random Forest; Microarray Gene expression; Silent diseases' diagnosis; CANCER CLASSIFICATION; ALGORITHM; HYBRID; FRAMEWORK; MACHINE; DISCOVERY;
D O I
10.1016/j.artmed.2020.101941
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Microarray gene expression profiling has emerged as an efficient technique for cancer diagnosis, prognosis, and treatment. One of the major drawbacks of gene expression microarrays is the "curse of dimensionality", which hinders the usefulness of information in datasets and leads to computational instability. In recent years, feature selection techniques have emerged as effective tools to identify disease biomarkers to aid in medical screening and diagnosis. However, the existing feature selection techniques, first, do not suit the rare variance exists in genomic data; and second, do not consider the feature cost (i.e. gene cost). Because ignoring features' costs may result in high cost gene profiling, this study proposes a new algorithm, called G-Forest, for cost-sensitive feature selection in gene expression microarrays. G-Forest is an ensemble cost-sensitive feature selection algorithm that develops a population of biases for a Random Forest induction algorithm. The G-Forest embeds the feature cost in the feature selection process and allows for simultaneous selection of low-cost and most informative features. In particular, when constructing the initial population, the feature is randomly selected with a probability inversely proportional to its associated cost. The G-Forest was compared with multiple state-of-the-art algorithms. Experimental results showed the effectiveness and robustness of the G-Forest in selecting the least cost and most informative genes. The G-Forest improved accuracy up to 14 % and decreased costs up to 56 % - on average when compared with the other approaches tested in this article.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Ensemble based Cost-Sensitive Feature Selection for Consolidated Knowledge Base Creation
    Ali, Syed Imran
    Lee, Sungyoung
    PROCEEDINGS OF THE 2020 14TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM), 2020,
  • [2] Cost-Sensitive Feature Selection on Heterogeneous Data
    Qian, Wenbin
    Shu, Wenhao
    Yang, Jun
    Wang, Yinglong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 397 - 408
  • [3] A Cost-Sensitive Feature Selection Method for High-Dimensional Data
    An, Chaojie
    Zhou, Qifeng
    14TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2019), 2019, : 1089 - 1094
  • [4] Cost-sensitive selection of variables by ensemble of model sequences
    Donghui Yan
    Zhiwei Qin
    Songxiang Gu
    Haiping Xu
    Ming Shao
    Knowledge and Information Systems, 2021, 63 : 1069 - 1092
  • [5] Cost-sensitive selection of variables by ensemble of model sequences
    Yan, Donghui
    Qin, Zhiwei
    Gu, Songxiang
    Xu, Haiping
    Shao, Ming
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (05) : 1069 - 1092
  • [6] A cost-sensitive rotation forest algorithm for gene expression data classification
    Lu, Huijuan
    Yang, Lei
    Yan, Ke
    Xue, Yu
    Gao, Zhigang
    NEUROCOMPUTING, 2017, 228 : 270 - 276
  • [7] Cost-Sensitive Ensemble Feature Ranking and Automatic Threshold Selection for Chronic Kidney Disease Diagnosis
    Imran Ali, Syed
    Ali, Bilal
    Hussain, Jamil
    Hussain, Musarrat
    Satti, Fahad Ahmed
    Park, Gwang Hoon
    Lee, Sungyoung
    APPLIED SCIENCES-BASEL, 2020, 10 (16):
  • [8] Cost-Sensitive Feature Selection for Class Imbalance Problem
    Bach, Malgorzata
    Werner, Aleksandra
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT I, 2018, 655 : 182 - 194
  • [9] Cost-sensitive Feature Selection for Support Vector Machines
    Benitez-Pena, S.
    Blanquero, R.
    Carrizosa, E.
    Ramirez-Cobo, P.
    COMPUTERS & OPERATIONS RESEARCH, 2019, 106 : 169 - 178
  • [10] Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features
    Zhou, Qifeng
    Zhou, Hao
    Li, Tao
    KNOWLEDGE-BASED SYSTEMS, 2016, 95 : 1 - 11