Large-scale gene expression data clustering through incremental ensemble approach

被引:0
|
作者
Khan, Imran [1 ]
Shaikh, Abdul Khalique [2 ]
Adhikari, Naresh [3 ]
机构
[1] Sultan Qaboos Univ, Coll Sci, Dept Comp Sci, Muscat, Oman
[2] Sultan Qaboos Univ, Coll Econ & Polit Sci, Dept Informat Syst, Muscat, Oman
[3] Slippery Rock Univ, Dept Comp Sci, 1 Morrow Way, Slippery Rock, PA USA
来源
关键词
ensemble clustering; gene expression; high dimensional; IECG; EXTREME LEARNING-MACHINE;
D O I
10.1088/2632-2153/ad81ca
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA microarray technology monitors gene activity in real-time in living organisms. It creates a large amount of data that helps scientists learn about how genes work. Clustering this data helps understand gene interactions and uncover important biological processes. However, the traditional clustering techniques have difficulties due to the enormous dimensionality of gene expression data and the intricacy of biological networks. Although ensemble clustering is a viable strategy, such high-dimensional data may not lend itself well to traditional approaches. This study introduces a novel technique for gene expression data clustering called incremental ensemble clustering for gene expression data (IECG). There are two steps in the IECG. A technique for grouping gene expression data into windows is presented in the first step, producing a tree of clusters. This procedure is carried out again for succeeding windows that have distinct feature sets. The base clusterings of two consecutive windows are ensembled using a new goal function to form a new clustering solution. By repeating this step-by-step method for further windows, reliable patterns that are beneficial for medical applications can be extracted. The results from both biological and non-biological data demonstrate that the proposed algorithm outperformed the state-of-the-art algorithms. Additionally, the running time of the proposed algorithm has been examined.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Large-Scale Clustering through Functional Embedding
    Ratle, Frederic
    Weston, Jason
    Miller, Matthew L.
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 266 - +
  • [32] A Randomized Approach to Large-Scale Subspace Clustering
    Traganitis, Panagiotis A.
    Giannakis, Georgios B.
    2016 50TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2016, : 1019 - 1023
  • [33] Incremental Clustering for Categorical Data Using Clustering Ensemble
    Li Taoying
    Chne Yan
    Qu Lili
    Mu Xiangwei
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 2519 - 2524
  • [34] A study of large-scale data clustering based on fuzzy clustering
    Li, Yangyang
    Yang, Guoli
    He, Haiyang
    Jiao, Licheng
    Shang, Ronghua
    SOFT COMPUTING, 2016, 20 (08) : 3231 - 3242
  • [35] A study of large-scale data clustering based on fuzzy clustering
    Yangyang Li
    Guoli Yang
    Haiyang He
    Licheng Jiao
    Ronghua Shang
    Soft Computing, 2016, 20 : 3231 - 3242
  • [36] covRNA: discovering covariate associations in large-scale gene expression data
    Urban, Lara
    Remmele, Christian W.
    Dittrich, Marcus
    Schwarz, Roland F.
    Mueller, Tobias
    BMC RESEARCH NOTES, 2020, 13 (01)
  • [37] Exploiting Scientific Workflows for Large-scale Gene Expression Data Analysis
    De Stasio, Alessandro
    Ertelt, Marcus
    Kemmner, Wolfgang
    Leser, Ulf
    Ceccarelli, Michele
    2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 447 - +
  • [38] covRNA: discovering covariate associations in large-scale gene expression data
    Lara Urban
    Christian W. Remmele
    Marcus Dittrich
    Roland F. Schwarz
    Tobias Müller
    BMC Research Notes, 13
  • [39] Iterative signature algorithm for the analysis of large-scale gene expression data
    Bergmann, S
    Ihmels, J
    Barkai, N
    PHYSICAL REVIEW E, 2003, 67 (03):
  • [40] Defining transcription modules using large-scale gene expression data
    Ihmels, J
    Bergmann, S
    Barkai, N
    BIOINFORMATICS, 2004, 20 (13) : 1993 - 2003