Large-scale gene expression data clustering through incremental ensemble approach

被引:0
|
作者
Khan, Imran [1 ]
Shaikh, Abdul Khalique [2 ]
Adhikari, Naresh [3 ]
机构
[1] Sultan Qaboos Univ, Coll Sci, Dept Comp Sci, Muscat, Oman
[2] Sultan Qaboos Univ, Coll Econ & Polit Sci, Dept Informat Syst, Muscat, Oman
[3] Slippery Rock Univ, Dept Comp Sci, 1 Morrow Way, Slippery Rock, PA USA
来源
关键词
ensemble clustering; gene expression; high dimensional; IECG; EXTREME LEARNING-MACHINE;
D O I
10.1088/2632-2153/ad81ca
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA microarray technology monitors gene activity in real-time in living organisms. It creates a large amount of data that helps scientists learn about how genes work. Clustering this data helps understand gene interactions and uncover important biological processes. However, the traditional clustering techniques have difficulties due to the enormous dimensionality of gene expression data and the intricacy of biological networks. Although ensemble clustering is a viable strategy, such high-dimensional data may not lend itself well to traditional approaches. This study introduces a novel technique for gene expression data clustering called incremental ensemble clustering for gene expression data (IECG). There are two steps in the IECG. A technique for grouping gene expression data into windows is presented in the first step, producing a tree of clusters. This procedure is carried out again for succeeding windows that have distinct feature sets. The base clusterings of two consecutive windows are ensembled using a new goal function to form a new clustering solution. By repeating this step-by-step method for further windows, reliable patterns that are beneficial for medical applications can be extracted. The results from both biological and non-biological data demonstrate that the proposed algorithm outperformed the state-of-the-art algorithms. Additionally, the running time of the proposed algorithm has been examined.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Consensus Clustering for Cancer Gene Expression Data Large-Scale Analysis using Evidence Accumulation Approach
    Sasic, Isidora
    Brdar, Sanja
    Loncar-Turukalo, Tatjana
    Aidos, Helena
    Fred, Ana
    PROCEEDINGS OF THE 10TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2017, : 176 - 183
  • [2] Large-scale clustering of CAGE tag expression data
    Shimokawa, Kazuro
    Okamura-Oho, Yuko
    Kurita, Takio
    Frith, Martin C.
    Kawai, Jun
    Carninci, Piero
    Hayashizaki, Yoshihide
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [3] Large-scale clustering of CAGE tag expression data
    Kazuro Shimokawa
    Yuko Okamura-Oho
    Takio Kurita
    Martin C Frith
    Jun Kawai
    Piero Carninci
    Yoshihide Hayashizaki
    BMC Bioinformatics, 8
  • [4] Analysis of large-scale gene expression data
    Sherlock, G
    CURRENT OPINION IN IMMUNOLOGY, 2000, 12 (02) : 201 - 205
  • [5] An Incremental Clustering of Gene Expression data
    Das, Rosy
    Bhattacharyya, Dhruba K.
    Kalita, Jugal K.
    2009 WORLD CONGRESS ON NATURE & BIOLOGICALLY INSPIRED COMPUTING (NABIC 2009), 2009, : 741 - +
  • [6] LSEC: Large-scale spectral ensemble clustering
    Li, Hongmin
    Ye, Xiucai
    Imakura, Akira
    Sakurai, Tetsuya
    INTELLIGENT DATA ANALYSIS, 2023, 27 (01) : 59 - 77
  • [7] Subsystem identification through dimensionality reduction of large-scale gene expression data
    Kim, PM
    Tidor, B
    GENOME RESEARCH, 2003, 13 (07) : 1706 - 1718
  • [8] A distributed and incremental algorithm for large-scale graph clustering
    Inoubli, Wissem
    Aridhi, Sabeur
    Mezni, Haithem
    Maddouri, Mondher
    Nguifo, Engelbert Mephu
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 134 : 334 - 347
  • [9] On the Clustering of Large-scale Data: A Matrix-based Approach
    Wang, Lijun
    Dong, Ming
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 139 - 144
  • [10] Genetic weighted k-means algorithm for clustering large-scale gene expression data
    Fang-Xiang Wu
    BMC Bioinformatics, 9