A new clustering method of gene expression data based on multivariate Gaussian mixture models

被引:15
|
作者
Liu, Zhe [1 ,2 ]
Song, Yu-qing [1 ]
Xie, Cong-hua [3 ]
Tang, Zheng [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Telecommun, Room 522, Zhenjiang, Jiangsu, Peoples R China
[2] Jilin Nomal Univ, Sch Comp Sci, Sipin, Jilin Province, Peoples R China
[3] Changshu Inst Technol, Sch Comp Sci & Engn, Suzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 高等学校博士学科点专项科研基金;
关键词
Gene expression data; Clustering; Multivariate Gaussian mixture models; Expectation maximization; QAIC criterion; K-MEANS; ALGORITHM;
D O I
10.1007/s11760-015-0749-5
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering gene expression data are an important problem in bioinformatics because understanding which genes behave similarly can lead to the discovery of important biological information. Many clustering methods have been used in the field of gene clustering. This paper proposed a new method for gene expression data clustering based on an improved expectation maximization(EM) method of multivariate Gaussian mixture models. To solve the problem of over-reliance on the initialization, we propose a remove and add initialization for the classical EM, and make a random perturbation on the solution before continuing EM iterations. The number of clusters is estimated with the Quasi Akaike's information criterion in this paper. The improved EM method is tested and compared with some other clustering methods; the performance of our clustering algorithm has been extensively compared over several simulated and real gene expression data sets. Our results indicated that improved EM clustering method is superior than other clustering algorithms and can be widely used for gene clustering.
引用
收藏
页码:359 / 368
页数:10
相关论文
共 50 条
  • [1] A new clustering method of gene expression data based on multivariate Gaussian mixture models
    Zhe Liu
    Yu-qing Song
    Cong-hua Xie
    Zheng Tang
    Signal, Image and Video Processing, 2016, 10 : 359 - 368
  • [2] Multivariate data clustering for the Gaussian mixture model
    Kavaliauskas, M
    Rudzkis, R
    INFORMATICA, 2005, 16 (01) : 61 - 74
  • [3] Clustering of gene expression data by mixture of PCA models
    Yoshioka, T
    Morioka, R
    Kobayashi, K
    Oba, S
    Ogawsawara, N
    Ishii, S
    ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 522 - 527
  • [4] Clustering gene expression data analysis using an improved EM algorithm based on multivariate elliptical contoured mixture models
    Liu, Zhe
    Song, Yu-qing
    Xie, Cong-hua
    Zhu, Feng
    Bao, Xiang
    OPTIK, 2014, 125 (21): : 6388 - 6394
  • [5] Model-based clustering of microarray expression data via latent Gaussian mixture models
    McNicholas, Paul D.
    Murphy, Thomas Brendan
    BIOINFORMATICS, 2010, 26 (21) : 2705 - 2712
  • [6] Multivariate data imputation using Gaussian mixture models
    Silva, Diogo S. F.
    Deutsch, Clayton, V
    SPATIAL STATISTICS, 2018, 27 : 74 - 90
  • [7] Gene expression clustering with functional mixture models
    Chudova, D
    Hart, C
    Mjolsness, E
    Smyth, P
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 683 - 690
  • [8] GMMchi: gene expression clustering using Gaussian mixture modeling
    Liu, Ta-Chun
    Kalugin, Peter N.
    Wilding, Jennifer L.
    Bodmer, Walter F.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [9] GMMchi: gene expression clustering using Gaussian mixture modeling
    Ta-Chun Liu
    Peter N. Kalugin
    Jennifer L. Wilding
    Walter F. Bodmer
    BMC Bioinformatics, 23
  • [10] A data structure and function classification based method to evaluate clustering models for gene expression data
    易东
    杨梦苏
    黄明辉
    李辉智
    王文昌
    Journal of Medical Colleges of PLA, 2002, (04) : 312 - 317