Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix

被引:63
|
作者
Siddharthan, Rahul [1 ]
机构
[1] Inst Math Sci, Madras 600113, Tamil Nadu, India
来源
PLOS ONE | 2010年 / 5卷 / 03期
关键词
NONADDITIVITY; SEQUENCES; DATABASE; CODE;
D O I
10.1371/journal.pone.0009722
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Identifying transcription factor binding sites (TFBS) in silico is key in understanding gene regulation. TFBS are string patterns that exhibit some variability, commonly modelled as "position weight matrices" (PWMs). Though convenient, the PWM has significant limitations, in particular the assumed independence of positions within the binding motif; and predictions based on PWMs are usually not very specific to known functional sites. Analysis here on binding sites in yeast suggests that correlation of dinucleotides is not limited to near-neighbours, but can extend over considerable gaps. Methodology/Principal Findings: I describe a straightforward generalization of the PWM model, that considers frequencies of dinucleotides instead of individual nucleotides. Unlike previous efforts, this method considers all dinucleotides within an extended binding region, and does not make an attempt to determine a priori the significance of particular dinucleotide correlations. I describe how to use a "dinucleotide weight matrix" (DWM) to predict binding sites, dealing in particular with the complication that its entries are not independent probabilities. Benchmarks show, for many factors, a dramatic improvement over PWMs in precision of predicting known targets. In most cases, significant further improvement arises by extending the commonly defined "core motifs" by about 10bp on either side. Though this flanking sequence shows no strong motif at the nucleotide level, the predictive power of the dinucleotide model suggests that the "signature" in DNA sequence of protein-binding affinity extends beyond the core protein-DNA contact region. Conclusion/Significance: While computationally more demanding and slower than PWM-based approaches, this dinucleotide method is straightforward, both conceptually and in implementation, and can serve as a basis for future improvements.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Occupancy Classification of Position Weight Matrix-Inferred Transcription Factor Binding Sites
    Wright, Hollis
    Cohen, Aaron
    Soenmez, Kemal
    Yochum, Gregory
    McWeeney, Shannon
    PLOS ONE, 2011, 6 (11):
  • [2] SIMCHIP: prediction of transcription factor DNA binding landscape and position weight matrices evaluation
    Minguet, E. G.
    Moyroud, E.
    Monnieux, M.
    Warthmann, N.
    Weigel, D.
    Blazquez, M. A.
    Parcy, F.
    FEBS JOURNAL, 2012, 279 : 520 - 520
  • [3] MORPHEUS, a Webtool for Transcription Factor Binding Analysis Using Position Weight Matrices with Dependency
    Minguet, Eugenio Gomez
    Segard, Stephane
    Charavay, Celine
    Parcy, Francois
    PLOS ONE, 2015, 10 (08):
  • [4] Similarity of position frequency matrices for transcription factor binding sites
    Schones, DE
    Sumazin, P
    Zhang, MQ
    BIOINFORMATICS, 2005, 21 (03) : 307 - 313
  • [5] Identification of co-occurring transcription factor binding sites from DNA sequence using clustered position weight matrices
    Oh, Young Min
    Kim, Jong Kyoung
    Choi, Seungjin
    Yoo, Joo-Yeon
    NUCLEIC ACIDS RESEARCH, 2012, 40 (05)
  • [6] Optimized Position Weight Matrices in Prediction of Novel Putative Binding Sites for Transcription Factors in the Drosophila melanogaster Genome
    Morozov, Vyacheslav Y.
    Ioshikhes, Ilya P.
    PLOS ONE, 2013, 8 (08):
  • [7] Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
    Bi, Yingtao
    Kim, Hyunsoo
    Gupta, Ravi
    Davuluri, Ramana V.
    PLOS ONE, 2011, 6 (09):
  • [8] Position Weight Matrix or Acyclic Probabilistic Finite Automaton: Which model to use? A decision rule inferred for the prediction of transcription factor binding sites
    Lavezzo, Guilherme Miura
    Lauretto, Marcelo de Souza
    Andrioli, Luiz Paulo Moura
    Machado-Lima, Ariane
    GENETICS AND MOLECULAR BIOLOGY, 2023, 46 (04)
  • [9] A DNA shape-based regulatory score improves position-weight matrix-based recognition of transcription factor binding sites
    Yang, Jichen
    Ramsey, Stephen A.
    BIOINFORMATICS, 2015, 31 (21) : 3445 - 3450
  • [10] Reliable scaling of position weight matrices for binding strength comparisons between transcription factors
    Xiaoyan Ma
    Daphne Ezer
    Carmen Navarro
    Boris Adryan
    BMC Bioinformatics, 16