Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning

被引:1
|
作者
Jing, Fang [1 ]
Zhang, Shao-Wu [1 ]
Cao, Zhen [2 ]
Zhang, Shihua l [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Coll Automat, Minist Educ, Key Lab Informat Fusion Technol, Xian 710072, Shaanxi, Peoples R China
[2] Chinese Acad Sci, Acad Math & Syst Sci, NCMIS, CEMS,RCSDS, Beijing 100190, Peoples R China
[3] Univ Chinese Acad Sci, Sch Math Sci, Beijing 100049, Peoples R China
基金
中国国家自然科学基金;
关键词
Bioinformatics; Machine learning; Transcription factors binding sites; Convolutional neural networks; DNA accessibility; Histone modification; CHROMATIN ACCESSIBILITY PREDICTION; NETWORKS;
D O I
10.1007/978-3-319-94968-0_23
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Knowing the transcription factor binding sites (TFBSs) is essential for modeling the underlying binding mechanisms and follow-up cellular functions. Convolutional neural networks (CNNs) have outperformed methods in predicting TFBSs from the primary DNA sequence. In addition to DNA sequences, histone modifications and chromatin accessibility are also important factors influencing their activity. They have been explored to predict TFBSs recently. However, current methods rarely take into account histone modifications and chromatin accessibility using CNN in an integrative framework. To this end, we developed a general CNN model to integrate these data for predicting TFBSs. We systematically benchmarked a series of architecture variants by changing network structure in terms of width and depth, and explored the effects of sample length at flanking regions. We evaluated the performance of the three types of data and their combinations using 256 ChIP-seq experiments and also compared it with competing machine learning methods. We find that contributions from these three types of data are complementary to each other. Moreover, the integrative CNN framework is superior to traditional machine learning methods with significant improvements.
引用
收藏
页码:241 / 252
页数:12
相关论文
共 50 条
  • [41] Discriminative discovery of transcription factor binding sites from location data
    Kawada, Y
    Sakakibara, Y
    2005 IEEE Computational Systems Bioinformatics Conference, Proceedings, 2005, : 86 - 89
  • [42] DeepCTF: transcription factor binding specificity prediction using DNA sequence plus shape in an attention-based deep learning model
    Tariq, Sana
    Amin, Asjad
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (6-7) : 5239 - 5251
  • [43] Evaluation of deep learning approaches for modeling transcription factor sequence specificity
    Zhang, Yonglin
    Mo, Qi
    Xue, Li
    Luo, Jiesi
    GENOMICS, 2021, 113 (06) : 3774 - 3781
  • [44] TrFAST: A Tool to Predict Signaling Pathway-specific Transcription Factor Binding Sites
    Umair Seemab
    Qurrat ul Ain
    Muhammad Sulaman Nawaz
    Zafar Saeed
    Sajid Rashid
    Genomics, Proteomics & Bioinformatics, 2012, (06) : 354 - 359
  • [45] Deep Learning to Identify Transcription Start Sites from CAGE Data
    Zheng, Hansi
    Li, Xiaoman
    Hu, Haiyan
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 168 - 172
  • [46] Using Fully Convolutional Network to Locate Transcription Factor Binding Sites Based on DNA Sequence and Conservation Information
    Zhang, Qinhu
    Xu, Youhong
    Wang, Siguo
    Wu, Yong
    Ye, Yuannong
    Yuan, Chang-An
    Gribova, Valeriya
    Filaretov, Vladimir Fedorovich
    Huang, De-Shuang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 2690 - 2699
  • [47] RLBind: a deep learning method to predict RNA-ligand binding sites
    Wang, Kaili
    Zhou, Renyi
    Wu, Yifan
    Li, Min
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (01)
  • [48] rVista for comparative sequence-based discovery of functional transcription factor binding sites
    Loots, GG
    Ovcharenko, I
    Pachter, L
    Dubchak, I
    Rubin, EM
    GENOME RESEARCH, 2002, 12 (05) : 832 - 839
  • [49] FLEXIBLE STATISTICAL MODELLING OF THE OCCURRENCES OF TRANSCRIPTION FACTOR BINDING SITES ALONG A DNA SEQUENCE
    Kallah-Dagadu, G.
    Nkansah, B. K.
    Howard, N. K.
    ADVANCES AND APPLICATIONS IN STATISTICS, 2018, 53 (06) : 659 - 691
  • [50] An Integrated Approach of Sequence and Text Mining Technology for the Identification of Transcription Factor Binding Sites
    Xiong, Yun
    Yang, Qing
    Qiu, Boren
    Zhu, Yangyong
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, PROCEEDINGS, 2008, : 178 - +