Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks

被引:121
作者
Agarwal, Vikram [1 ,2 ]
Shendure, Jay [1 ,3 ,4 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Cal Life Sci LLC, San Francisco, CA 94080 USA
[3] Howard Hughes Med Inst, Seattle, WA 98195 USA
[4] Univ Washington, Brotman Baty Inst Precis Med, Seattle, WA 98195 USA
关键词
GENE-EXPRESSION; CHIP-SEQ; TRANSCRIPTION FACTORS; INTEGRATIVE ANALYSIS; SUPER-ENHANCERS; PROMOTER; DNA; ANNOTATION; DISCOVERY; ELEMENTS;
D O I
10.1016/j.celrep.2020.107663
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Algorithms that accurately predict gene structure from primary sequence alone were transformative for annotating the human genome. Can we also predict the expression levels of genes based solely on genome sequence? Here, we sought to apply deep convolutional neural networks toward that goal. Surprisingly, a model that includes only promoter sequences and features associated with mRNA stability explains 59% and 71% of variation in steady-state mRNA levels in human and mouse, respectively. This model, termed Xpresso, more than doubles the accuracy of alternative sequence-based models and isolates rules as predictive as models relying on chromatic immunoprecipitation sequencing (ChIP-seq) data. Xpresso recapitulates genome-wide patterns of transcriptional activity, and its residuals can be used to quantify the influence of enhancers, heterochromatic domains, and microRNAs. Model interpretation reveals that promoter-proximal CpG dinucleotides strongly predict transcriptional activity. Looking forward, we propose cell-type-specific gene-expression predictions based solely on primary sequences as a grand challenge for the field.
引用
收藏
页数:17
相关论文
共 68 条
[1]   Predicting effective microRNA target sites in mammalian mRNAs [J].
Agarwal, Vikram ;
Bell, George W. ;
Nam, Jin-Wu ;
Bartel, David P. .
ELIFE, 2015, 4
[2]   Ensembl 2017 [J].
Aken, Bronwen L. ;
Achuthan, Premanand ;
Akanni, Wasiu ;
Amode, M. Ridwan ;
Bernsdorff, Friederike ;
Bhai, Jyothish ;
Billis, Konstantinos ;
Carvalho-Silva, Denise ;
Cummins, Carla ;
Clapham, Peter ;
Gil, Laurent ;
Giron, Carlos Garcia ;
Gordon, Leo ;
Hourlier, Thibaut ;
Hunt, Sarah E. ;
Janacek, Sophie H. ;
Juettemann, Thomas ;
Keenan, Stephen ;
Laird, Matthew R. ;
Lavidas, Ilias ;
Maurel, Thomas ;
McLaren, William ;
Moore, Benjamin ;
Murphy, Daniel N. ;
Nag, Rishi ;
Newman, Victoria ;
Nuhn, Michael ;
Ong, Chuang Kee ;
Parker, Anne ;
Patricio, Mateus ;
Riat, Harpreet Singh ;
Sheppard, Daniel ;
Sparrow, Helen ;
Taylor, Kieron ;
Thormann, Anja ;
Vullo, Alessandro ;
Walts, Brandon ;
Wilder, Steven P. ;
Zadissa, Amonida ;
Kostadima, Myrto ;
Martin, Fergal J. ;
Muffato, Matthieu ;
Perry, Emily ;
Ruffier, Magali ;
Staines, Daniel M. ;
Trevanion, Stephen J. ;
Cunningham, Fiona ;
Yates, Andrew ;
Zerbino, Daniel R. ;
Flicek, Paul .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D635-D642
[3]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[4]  
Ancona Marco, 2018, PROC 6 INT C LEARNIN
[5]   Deep learning for computational biology [J].
Angermueller, Christof ;
Parnamaa, Tanel ;
Parts, Leopold ;
Stegle, Oliver .
MOLECULAR SYSTEMS BIOLOGY, 2016, 12 (07)
[6]  
[Anonymous], BIORXIV, DOI DOI 10.1101/660563
[7]  
[Anonymous], 2015, TENSOR
[8]  
[Anonymous], BIORXIV
[9]  
[Anonymous], ICML 13 P 30 INT C M
[10]   Structure, function and evolution of CpG island promoters [J].
Antequera, F .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2003, 60 (08) :1647-1658