Dinucleotide Weight Matrices for Predicting Transcription Factor Binding Sites: Generalizing the Position Weight Matrix

被引:63
|
作者
Siddharthan, Rahul [1 ]
机构
[1] Inst Math Sci, Madras 600113, Tamil Nadu, India
来源
PLOS ONE | 2010年 / 5卷 / 03期
关键词
NONADDITIVITY; SEQUENCES; DATABASE; CODE;
D O I
10.1371/journal.pone.0009722
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Identifying transcription factor binding sites (TFBS) in silico is key in understanding gene regulation. TFBS are string patterns that exhibit some variability, commonly modelled as "position weight matrices" (PWMs). Though convenient, the PWM has significant limitations, in particular the assumed independence of positions within the binding motif; and predictions based on PWMs are usually not very specific to known functional sites. Analysis here on binding sites in yeast suggests that correlation of dinucleotides is not limited to near-neighbours, but can extend over considerable gaps. Methodology/Principal Findings: I describe a straightforward generalization of the PWM model, that considers frequencies of dinucleotides instead of individual nucleotides. Unlike previous efforts, this method considers all dinucleotides within an extended binding region, and does not make an attempt to determine a priori the significance of particular dinucleotide correlations. I describe how to use a "dinucleotide weight matrix" (DWM) to predict binding sites, dealing in particular with the complication that its entries are not independent probabilities. Benchmarks show, for many factors, a dramatic improvement over PWMs in precision of predicting known targets. In most cases, significant further improvement arises by extending the commonly defined "core motifs" by about 10bp on either side. Though this flanking sequence shows no strong motif at the nucleotide level, the predictive power of the dinucleotide model suggests that the "signature" in DNA sequence of protein-binding affinity extends beyond the core protein-DNA contact region. Conclusion/Significance: While computationally more demanding and slower than PWM-based approaches, this dinucleotide method is straightforward, both conceptually and in implementation, and can serve as a basis for future improvements.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Predicting Transcription Factor Binding Sites in DNA Sequences Without Prior Knowledge
    Lee, Wook
    Park, Byungkyu
    Choi, Daesik
    Lee, Chungkeun
    Chae, Hanju
    Han, Kyungsook
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I, 2016, 9771 : 386 - 391
  • [42] A multiple-feature framework for modelling and predicting transcription factor binding sites
    Pudimat, R
    Schukat-Talamazzini, EG
    Backofen, R
    BIOINFORMATICS, 2005, 21 (14) : 3082 - 3088
  • [43] Transcription factor binding sites recognition by the regularities matrices based on the natural classification method
    Vityaev, E. E.
    Lapardin, K. A.
    Khomicheva, I., V
    Levitsky, V. G.
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 199 - +
  • [44] Pseudocounts for transcription factor binding sites
    Nishida, Keishin
    Frith, Martin C.
    Nakai, Kenta
    NUCLEIC ACIDS RESEARCH, 2009, 37 (03) : 939 - 944
  • [45] DISPARE: DIScriminative PAttern REfinement for Position Weight Matrices
    Isabelle da Piedade
    Man-Hung Eric Tang
    Olivier Elemento
    BMC Bioinformatics, 10
  • [46] LINEAR-TIME MATCHING OF POSITION WEIGHT MATRICES
    Stojanovic, Nikola
    BIONFORMATICS 2010: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON BIOINFORMATICS, 2010, : 66 - 73
  • [47] DISPARE: DIScriminative PAttern REfinement for Position Weight Matrices
    da Piedade, Isabelle
    Tang, Man-Hung Eric
    Elemento, Olivier
    BMC BIOINFORMATICS, 2009, 10
  • [48] Enhanced position weight matrices using mixture models
    Hannenhalli, S
    Wang, LS
    BIOINFORMATICS, 2005, 21 : I204 - I212
  • [49] Predicting splicing patterns from the transcription factor binding sites in the promoter with deep learning
    Lin, Tzu-Chieh
    Tsai, Cheng-Hung
    Shiau, Cheng-Kai
    Huang, Jia-Hsin
    Tsai, Huai-Kuang
    BMC GENOMICS, 2024, 25 (SUPPL 3):
  • [50] Quantum Algorithm for Position Weight Matrix Matching
    Miyamoto, Koichi
    Yamamoto, Naoki
    Sakakibara, Yasubumi
    IEEE TRANSACTIONS ON QUANTUM ENGINEERING, 2023, 4