A linear time biclustering algorithm for time series gene expression data

被引:0
|
作者
Madeira, SC
Oliveira, AL
机构
[1] INESC, ID, Lisbon, Portugal
[2] Univ Tecn Lisboa, IST, Lisbon, Portugal
[3] Univ Beira Interior, Covilha, Portugal
来源
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a non-supervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors. In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources. In this work, we examine a particular setting of the problem, where we are concerned with finding biclusters in time series expression data. In this context, we are interested in finding biclusters with consecutive columns. For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix. This complexity is obtained by manipulating a discretized version of the matrix and by using string processing techniques based on suffix trees. We report results in both synthetic and real data that show the effectiveness of the approach.
引用
收藏
页码:39 / 52
页数:14
相关论文
共 50 条
  • [41] Parallel e-CCC-Biclustering: Mining Approximate Temporal Patterns in Gene Expression Time Series Using Parallel Biclustering
    Cristovao, Filipe
    Madeira, Sara C.
    6TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2012, 154 : 21 - +
  • [42] Analysis on time-lagged gene clusters in time series gene expression data
    Zeng, Tao
    Liu, Juan
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 181 - +
  • [43] A Weighted Mutual Information Biclustering Algorithm for Gene Expression Data
    Li, Yidong
    Liu, Wenhua
    Jia, Yankun
    Dong, Hairong
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2017, 14 (03) : 643 - 660
  • [44] Clustering short time series gene expression data
    Ernst, J
    Nau, GJ
    Bar-Joseph, Z
    BIOINFORMATICS, 2005, 21 : I159 - I168
  • [45] Time series analysis of gene expression and location data
    Yeang, CH
    Jaakkola, T
    THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 305 - 312
  • [46] Time series analysis of gene expression and location data
    Yeang, CH
    Jaakkola, T
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2005, 14 (05) : 755 - 769
  • [47] A Biclustering Method for Time Series Analysis
    Lee, Jeonghwa
    Lee, Youngrok
    Jun, Chi-Hyuck
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2010, 9 (02): : 131 - 140
  • [48] MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data
    Yang, Bei
    Xu, Yaohui
    Maxwell, Andrew
    Koh, Wonryull
    Gong, Ping
    Zhang, Chaoyang
    BMC SYSTEMS BIOLOGY, 2018, 12
  • [49] Clustering Time-Series Gene Expression Data with Unequal Time Intervals
    Rueda, Luis
    Bari, Ataul
    Ngom, Alioune
    TRANSACTIONS ON COMPUTATIONAL SYSTEMS BIOLOGY X, 2008, 5410 : 100 - 123
  • [50] Integration of heterogeneous time series gene expression data by clustering on time dimension
    Ahn, Hongryul
    Chae, Heejoon
    Jung, Woosuk
    Kim, Sun
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 332 - 335