Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data

被引:3
|
作者
Siewert, Elizabeth A. [1 ]
Kechris, Katerina J. [1 ]
机构
[1] Univ Colorado, Denver, CO 80202 USA
关键词
GENE-EXPRESSION; REGULATORY ELEMENTS; DISCOVERY; NETWORKS; CONSERVATION; COEXPRESSION; EVOLUTION; SELECTION; PROFILES; PATTERNS;
D O I
10.2202/1544-6115.1464
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
De novo identification of transcription factor binding sites (TFBS) is a challenging computational problem because TFBSs are relatively short sequences buried in long genomic regions. Earlier methods incorporated genome-wide expression data and promoter sequences into a linear-model framework, regressing expression on counts of putative TFBSs in promoters for a single species. More recently, it has been shown that examining sequence data across multiple species improves the prediction of TFBSs. In this work, we describe an extension of the single-species, linear-model framework for the analysis of paired cross-species sequence and expression data. A repeated measures model for gene-expression measurements across species is used, accounting for phylogenetic relationships among species through the error covariance structure. This multiple-species algorithm is applied to a data set of four yeast species grown under heat-shock conditions and comparisons are made to the single species algorithm. Using evaluations based on transcription factor binding strength and an independent source of expression data, we find the multiple species results show an improvement in the prediction of TFBS.
引用
收藏
页数:36
相关论文
共 50 条
  • [1] Cross-species regulatory sequence activity prediction
    Kelley, David R.
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (07)
  • [2] On gene prediction by cross-species comparative sequence analysis
    Chen, R
    Ali, H
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 446 - 447
  • [3] AGenDA: gene prediction by cross-species sequence comparison
    Taher, L
    Rinner, O
    Garg, S
    Sczyrba, A
    Morgenstern, B
    NUCLEIC ACIDS RESEARCH, 2004, 32 : W305 - W308
  • [4] rmRNAseq: differential expression analysis for repeated-measures RNA-seq data
    Nguyen, Yet
    Nettleton, Dan
    BIOINFORMATICS, 2020, 36 (16) : 4432 - 4439
  • [5] YMGV: a cross-species expression data mining tool
    Lelandais, G
    Le Crom, S
    Devaux, F
    Vialette, S
    Church, GM
    Jacq, C
    Marc, P
    NUCLEIC ACIDS RESEARCH, 2004, 32 : D323 - D325
  • [6] A generalized permutation model for the analysis of cross-species data
    Lapointe, FJ
    Garland, T
    JOURNAL OF CLASSIFICATION, 2001, 18 (01) : 109 - 127
  • [7] A Generalized Permutation Model for the Analysis of Cross-Species Data
    François-Joseph Lapointe
    Theodore Garland, Jr.
    Journal of Classification, 2001, 18 : 109 - 127
  • [8] CORRELATION-ANALYSIS OF TWIN DATA WITH REPEATED-MEASURES BASED ON GENERALIZED ESTIMATING EQUATIONS
    GROVE, JS
    ZHAO, LP
    QUIAOIT, F
    GENETIC EPIDEMIOLOGY, 1993, 10 (06) : 539 - 544
  • [9] DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding
    Shi, Wanqi
    Feng, Hailin
    Li, Jian
    Liu, Tongcun
    Liu, Zhe
    FRONTIERS IN GENETICS, 2023, 14
  • [10] PlantExpress: A Database Integrating OryzaExpress and ArthaExpress for Single-species and Cross-species Gene Expression Network Analyses with Microarray-Based Transcriptome Data
    Kudo, Toru
    Terashima, Shin
    Takaki, Yuno
    Tomita, Ken
    Saito, Misa
    Kanno, Maasa
    Yokoyama, Koji
    Yano, Kentaro
    PLANT AND CELL PHYSIOLOGY, 2017, 58 (01) : e1