Large-scale gene expression data clustering through incremental ensemble approach

被引:0
|
作者
Khan, Imran [1 ]
Shaikh, Abdul Khalique [2 ]
Adhikari, Naresh [3 ]
机构
[1] Sultan Qaboos Univ, Coll Sci, Dept Comp Sci, Muscat, Oman
[2] Sultan Qaboos Univ, Coll Econ & Polit Sci, Dept Informat Syst, Muscat, Oman
[3] Slippery Rock Univ, Dept Comp Sci, 1 Morrow Way, Slippery Rock, PA USA
来源
关键词
ensemble clustering; gene expression; high dimensional; IECG; EXTREME LEARNING-MACHINE;
D O I
10.1088/2632-2153/ad81ca
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA microarray technology monitors gene activity in real-time in living organisms. It creates a large amount of data that helps scientists learn about how genes work. Clustering this data helps understand gene interactions and uncover important biological processes. However, the traditional clustering techniques have difficulties due to the enormous dimensionality of gene expression data and the intricacy of biological networks. Although ensemble clustering is a viable strategy, such high-dimensional data may not lend itself well to traditional approaches. This study introduces a novel technique for gene expression data clustering called incremental ensemble clustering for gene expression data (IECG). There are two steps in the IECG. A technique for grouping gene expression data into windows is presented in the first step, producing a tree of clusters. This procedure is carried out again for succeeding windows that have distinct feature sets. The base clusterings of two consecutive windows are ensembled using a new goal function to form a new clustering solution. By repeating this step-by-step method for further windows, reliable patterns that are beneficial for medical applications can be extracted. The results from both biological and non-biological data demonstrate that the proposed algorithm outperformed the state-of-the-art algorithms. Additionally, the running time of the proposed algorithm has been examined.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] A modular approach for integrative analysis of large-scale gene-expression and drug-response data
    Zoltán Kutalik
    Jacques S Beckmann
    Sven Bergmann
    Nature Biotechnology, 2008, 26 : 531 - 539
  • [42] A modular approach for integrative analysis of large-scale gene-expression and drug-response data
    Kutalik, Zoltan
    Beckmann, Jacques S.
    Bergmann, Sven
    NATURE BIOTECHNOLOGY, 2008, 26 (05) : 531 - 539
  • [43] An Ensemble Approach for Gene Selection in Gene Expression Data
    Castellanos-Garzon, Jose A.
    Ramos, Juan
    Lopez-Sanchez, Daniel
    de Paz, Juan F.
    11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 237 - 247
  • [44] A Novel Clustering Algorithm on Large-Scale Graph Data
    Zhang, Hao
    Zhou, Wei
    Wan, Xiaoyu
    Fu, Ge
    Xu, Zhiyong
    Han, Jizhong
    2014 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2014, : 47 - 54
  • [45] Large-scale clustering of cDNA-fingerprinting data
    Herwig, R
    Poustka, AJ
    Müller, C
    Bull, C
    Lehrach, H
    O'Brien, J
    GENOME RESEARCH, 1999, 9 (11) : 1093 - 1105
  • [46] Queries over Large-scale Incremental Data of Hybrid Granularities
    Zhuang, Xutian
    Zhao, Gansen
    Wang, Xinming
    Nie, Ruihua
    Liao, Zhirui
    Lin, Chengchuang
    Li, Zhenyu
    2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD), 2016, : 69 - 74
  • [47] A Novel Approach to Clustering and Assembly of Large-Scale Roche 454 Transcriptome Data for Gene Validation and Alternative Splicing Analysis
    Bevilacqua, Vitoantonio
    Stroppa, Fabio
    Saladino, Stefano
    Picardi, Ernesto
    BIO-INSPIRED COMPUTING AND APPLICATIONS, 2012, 6840 : 641 - +
  • [48] A Novel Clustering Algorithm and Its Incremental Version for Large-Scale Text Collection
    Chen, Lei
    Liu, Ming
    Wu, Chong
    Xu, Ai
    INFORMATION TECHNOLOGY AND CONTROL, 2016, 45 (02): : 136 - 147
  • [49] A CASCADING INCREMENTAL TRAINING APPROACH FOR LARGE-SCALE DISTRIBUTED DATA BASED ON SUPPORT VECTOR MACHINE
    Xu Yuanyuan
    Li Shucheng
    Li Fan
    Gu Xiaofeng
    Sun Rui
    2020 17TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2020, : 130 - 133
  • [50] Data-driven robust optimization in the face of large-scale datasets: An incremental learning approach
    Asgari, Somayeh Danesh
    Mohammadi, Emran
    Makui, Ahmad
    Jafari, Mostafa
    JOURNAL OF COMPUTATIONAL SCIENCE, 2024, 83