Large-scale gene expression data clustering through incremental ensemble approach

被引:0
|
作者
Khan, Imran [1 ]
Shaikh, Abdul Khalique [2 ]
Adhikari, Naresh [3 ]
机构
[1] Sultan Qaboos Univ, Coll Sci, Dept Comp Sci, Muscat, Oman
[2] Sultan Qaboos Univ, Coll Econ & Polit Sci, Dept Informat Syst, Muscat, Oman
[3] Slippery Rock Univ, Dept Comp Sci, 1 Morrow Way, Slippery Rock, PA USA
来源
关键词
ensemble clustering; gene expression; high dimensional; IECG; EXTREME LEARNING-MACHINE;
D O I
10.1088/2632-2153/ad81ca
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA microarray technology monitors gene activity in real-time in living organisms. It creates a large amount of data that helps scientists learn about how genes work. Clustering this data helps understand gene interactions and uncover important biological processes. However, the traditional clustering techniques have difficulties due to the enormous dimensionality of gene expression data and the intricacy of biological networks. Although ensemble clustering is a viable strategy, such high-dimensional data may not lend itself well to traditional approaches. This study introduces a novel technique for gene expression data clustering called incremental ensemble clustering for gene expression data (IECG). There are two steps in the IECG. A technique for grouping gene expression data into windows is presented in the first step, producing a tree of clusters. This procedure is carried out again for succeeding windows that have distinct feature sets. The base clusterings of two consecutive windows are ensembled using a new goal function to form a new clustering solution. By repeating this step-by-step method for further windows, reliable patterns that are beneficial for medical applications can be extracted. The results from both biological and non-biological data demonstrate that the proposed algorithm outperformed the state-of-the-art algorithms. Additionally, the running time of the proposed algorithm has been examined.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Large-Scale Data Clustering Using Manifold-Regularized Ensemble of Posterior in GAN
    Haleh Homayouni
    Eghbal Mansoori
    Arabian Journal for Science and Engineering, 2022, 47 : 1173 - 1180
  • [22] Large-Scale Data Clustering Using Manifold-Regularized Ensemble of Posterior in GAN
    Homayouni, Haleh
    Mansoori, Eghbal
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 1173 - 1180
  • [23] GENE DISCOVERY METHODS FROM LARGE-SCALE GENE EXPRESSION DATA
    Shimizu, Akifumi
    Yano, Kentaro
    QUANTUM BIO-INFORMATICS III: FROM QUANTUM INFORMATION TO BIO-INFORMATICS, 2010, 26 : 489 - +
  • [24] A Novel Visual analytics Approach for Clustering Large-Scale Social Data
    Wang, Zhangye
    Zhou, Juanxia
    Chen, Wei
    Chen, Chang
    Liao, Jiyuan
    Maciejewski, Ross
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [25] Clustering cancer gene expression data by projective clustering ensemble
    Yu, Xianxue
    Yu, Guoxian
    Wang, Jun
    PLOS ONE, 2017, 12 (02):
  • [26] paraGSEA: a scalable approach for large-scale gene expression profiling
    Peng, Shaoliang
    Yang, Shunyun
    Bo, Xiaochen
    Li, Fei
    NUCLEIC ACIDS RESEARCH, 2017, 45 (17)
  • [27] Challenges and prospects in the analysis of large-scale gene expression data
    Ihmeis, JH
    Bergmann, S
    BRIEFINGS IN BIOINFORMATICS, 2004, 5 (04) : 313 - 327
  • [28] Large-scale analysis of gene clustering in bacteria
    Yang, Qingwu
    Sze, Sing-Hoi
    GENOME RESEARCH, 2008, 18 (06) : 949 - 956
  • [29] Automated Protocol for Large-Scale Modeling of Gene Expression Data
    Hall, Michelle Lynn
    Calkins, David
    Sherman, Woody
    Journal of Chemical Information and Modeling, 2016, 56 (11) : 2216 - 2224
  • [30] A large-scale disturbance mapping ensemble through data-driven regionalization
    Bueno, Inacio Thomaz
    Hird, Jennifer
    McDermid, Gregory John
    Galvao, Lenio Soares
    Acerbi Junior, Fausto Weimar
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (12) : 3700 - 3725