Semi-supervised clustering for gene-expression data in multiobjective optimization framework

被引:25
|
作者
Alok, Abhay Kumar [1 ]
Saha, Sriparna [1 ]
Ekbal, Asif [1 ]
机构
[1] Indian Inst Technol, Comp Sci Engn, Patna, Bihar, India
关键词
Gene expression data clustering; Semi-supervised classification; Multiobjective optimization; Cluster validity index; AMOSA; TRANSCRIPTIONAL PROGRAM; OLIGONUCLEOTIDE ARRAYS; COEXPRESSED GENES; ALGORITHM; MICROARRAY; PATTERNS; CLASSIFICATION; INDEXES;
D O I
10.1007/s13042-015-0335-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out.
引用
收藏
页码:421 / 439
页数:19
相关论文
共 50 条
  • [21] A global optimization method for semi-supervised clustering
    Xia, Yu
    DATA MINING AND KNOWLEDGE DISCOVERY, 2009, 18 (02) : 214 - 256
  • [22] A global optimization method for semi-supervised clustering
    Yu Xia
    Data Mining and Knowledge Discovery, 2009, 18 : 214 - 256
  • [23] A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data
    R. Priscilla
    S. Swamynathan
    Frontiers of Computer Science, 2013, 7 : 204 - 213
  • [24] A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data
    Priscilla, R.
    Swamynathan, S.
    FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (02) : 204 - 213
  • [25] A semi-supervised clustering algorithm for data exploration
    Bouchachia, A
    Pedrycz, W
    FUZZY SETS AND SYSTEMS - IFSA 2003, PROCEEDINGS, 2003, 2715 : 328 - 337
  • [26] Semi-Supervised Clustering and Aggregation of Relational Data
    Frigui, Hichem
    Hwang, Cheul
    2008 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS, VOLS 1-3, 2008, : 1087 - 1092
  • [27] A New semi-supervised clustering for incomplete data
    Goel, Sonia
    Tushir, Meena
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (02) : 727 - 739
  • [28] Learning Semi-Supervised Representation Towards a Unified Optimization Framework for Semi-Supervised Learning
    Li, Chun-Guang
    Lin, Zhouchen
    Zhang, Honggang
    Guo, Jun
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2767 - 2775
  • [29] A Semi-Supervised Weighted Clustering Framework Facing to Hybrid Attributes Data Streams
    Chen, Xinquan
    2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 5988 - 5993
  • [30] A Framework for Semi-Supervised Clustering Based on Dimensionality Reduction
    Cui Peng
    Zhang Ru-bo
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 192 - +