DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data

被引:0
|
作者
Grigoriadis, Dimitris [1 ,2 ]
Perdikopanis, Nikos [1 ,3 ,4 ]
Georgakilas, Georgios K. [4 ,5 ]
Hatzigeorgiou, Artemis G. [1 ,2 ]
机构
[1] Hellenic Pasteur Institute, Athens,11521, Greece
[2] Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia,35131, Greece
[3] Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens,15784, Greece
[4] Department of Electrical and Computer Engineering, University of Thessaly, Volos,38221, Greece
[5] ommAI Technologies, Tallinn, Estonia
关键词
Bioinformatics - Computational methods - Convolution - Convolutional neural networks - Deep learning - Learning systems - Molecular biology - Proteins - Signal processing - Signal to noise ratio;
D O I
暂无
中图分类号
学科分类号
摘要
Background: The widespread usage of Cap Analysis of Gene Expression (CAGE) has led to numerous breakthroughs in understanding the transcription mechanisms. Recent evidence in the literature, however, suggests that CAGE suffers from transcriptional and technical noise. Regardless of the sample quality, there is a significant number of CAGE peaks that are not associated with transcription initiation events. This type of signal is typically attributed to technical noise and more frequently to random five-prime capping or transcription bioproducts. Thus, the need for computational methods emerges, that can accurately increase the signal-to-noise ratio in CAGE data, resulting in error-free transcription start site (TSS) annotation and quantification of regulatory region usage. In this study, we present DeepTSS, a novel computational method for processing CAGE samples, that combines genomic signal processing (GSP), structural DNA features, evolutionary conservation evidence and raw DNA sequence with Deep Learning (DL) to provide single-nucleotide TSS predictions with unprecedented levels of performance. Results: To evaluate DeepTSS, we utilized experimental data, protein-coding gene annotations and computationally-derived genome segmentations by chromatin states. DeepTSS was found to outperform existing algorithms on all benchmarks, achieving 98% precision and 96% sensitivity (accuracy 95.4%) on the protein-coding gene strategy, with 96.66% of its positive predictions overlapping active chromatin, 98.27% and 92.04% co-localized with at least one transcription factor and H3K4me3 peak. Conclusions: CAGE is a key protocol in deciphering the language of transcription, however, as every experimental protocol, it suffers from biological and technical noise that can severely affect downstream analyses. DeepTSS is a novel DL-based method for effectively removing noisy CAGE signal. In contrast to existing software, DeepTSS does not require feature selection since the embedded convolutional layers can readily identify patterns and only utilize the important ones for the classification task. This study highlights the key role that DL can play in Molecular Biology, by removing the inherent flaws of experimental protocols, that form the backbone of contemporary research. Here, we show how DeepTSS can unleash the full potential of an already popular and mature method such as CAGE, and push the boundaries of coding and non-coding gene expression regulator research even further. © 2022, The Author(s).
引用
收藏
相关论文
共 50 条
  • [1] DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data
    Dimitris Grigoriadis
    Nikos Perdikopanis
    Georgios K. Georgakilas
    Artemis G. Hatzigeorgiou
    BMC Bioinformatics, 23
  • [2] DeepTSS: multi-branch convolutional neural network for transcription start site identification from CAGE data
    Grigoriadis, Dimitris
    Perdikopanis, Nikos
    Georgakilas, Georgios K.
    Hatzigeorgiou, Artemis G.
    BMC BIOINFORMATICS, 2022, 23 (SUPPL 2)
  • [3] Multi-branch Aggregate Convolutional Neural Network for Image Classification
    Fan, Rui
    Jiang, Pinqun
    Zeng, Shangyou
    Li, Peng
    SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 102 - 112
  • [4] Multi-branch sustainable convolutional neural network for disease classification
    Naz, Maria
    Shah, Munam Ali
    Khattak, Hasan Ali
    Wahid, Abdul
    Asghar, Muhammad Nabeel
    Rauf, Hafiz Tayyab
    Khan, Muhammad Attique
    Ameer, Zoobia
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2023, 33 (05) : 1621 - 1633
  • [5] A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning
    He, Shan
    Lu, Yuanyao
    ELECTRONICS, 2019, 8 (12)
  • [6] Multi-branch convolutional neural network for multiple sclerosis lesion segmentation
    Aslani, Shahab
    Dayan, Michael
    Storelli, Loredana
    Filippi, Massimo
    Murino, Vittorio
    Rocca, Maria A.
    Sona, Diego
    NEUROIMAGE, 2019, 196 : 1 - 15
  • [7] A multi-branch convolutional neural network with density map for aphid counting
    Li, Rui
    Wang, Rujing
    Xie, Chengjun
    Chen, Hongbo
    Long, Qi
    Liu, Liu
    Zhang, Jie
    Chen, Tianjiao
    Hu, Haiying
    Jiao, Lin
    Du, Jianming
    Liu, Haiyun
    BIOSYSTEMS ENGINEERING, 2022, 213 : 148 - 161
  • [8] A multi-branch convolutional neural network for snoring detection based on audio
    Dong, Hao
    Wu, Haitao
    Yang, Guan
    Zhang, Junming
    Wan, Keqin
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2024,
  • [9] Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
    Georgios K. Georgakilas
    Andrea Grioni
    Konstantinos G. Liakos
    Eliska Chalupova
    Fotis C. Plessas
    Panagiotis Alexiou
    Scientific Reports, 10
  • [10] Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
    Georgakilas, Georgios K.
    Grioni, Andrea
    Liakos, Konstantinos G.
    Chalupova, Eliska
    Plessas, Fotis C.
    Alexiou, Panagiotis
    SCIENTIFIC REPORTS, 2020, 10 (01)