Effective gene expression prediction from sequence by integrating long-range interactions

被引:519
作者
Avsec, Ziga [1 ]
Agarwal, Vikram [2 ]
Visentin, Daniel [1 ]
Ledsam, Joseph R. [1 ,3 ]
Grabska-Barwinska, Agnieszka [1 ]
Taylor, Kyle R. [1 ]
Assael, Yannis [1 ]
Jumper, John [1 ]
Kohli, Pushmeet [1 ]
Kelley, David R. [2 ]
机构
[1] DeepMind, London, England
[2] Calico Life Sci, San Francisco, CA 94080 USA
[3] Google, Tokyo, Japan
关键词
DNA; GENOME; VARIANTS;
D O I
10.1038/s41592-021-01252-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
By using a new deep learning architecture, Enformer leverages long-range information to improve prediction of gene expression on the basis of DNA sequence. How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution.
引用
收藏
页码:1196 / +
页数:24
相关论文
共 42 条
[1]   Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks [J].
Agarwal, Vikram ;
Shendure, Jay .
CELL REPORTS, 2020, 31 (07)
[2]   The GTEx Consortium atlas of genetic regulatory effects across human tissues [J].
Aguet, Francois ;
Barbeira, Alvaro N. ;
Bonazzola, Rodrigo ;
Brown, Andrew ;
Castel, Stephane E. ;
Jo, Brian ;
Kasela, Silva ;
Kim-Hellmuth, Sarah ;
Liang, Yanyu ;
Parsana, Princy ;
Flynn, Elise ;
Fresard, Laure ;
Gamazon, Eric R. ;
Hamel, Andrew R. ;
He, Yuan ;
Hormozdiari, Farhad ;
Mohammadi, Pejman ;
Munoz-Aguirre, Manuel ;
Ardlie, Kristin G. ;
Battle, Alexis ;
Bonazzola, Rodrigo ;
Brown, Christopher D. ;
Cox, Nancy ;
Dermitzakis, Emmanouil T. ;
Engelhardt, Barbara E. ;
Garrido-Martin, Diego ;
Gay, Nicole R. ;
Getz, Gad ;
Guigo, Roderic ;
Hamel, Andrew R. ;
Handsaker, Robert E. ;
He, Yuan ;
Hoffman, Paul J. ;
Hormozdiari, Farhad ;
Im, Hae Kyung ;
Jo, Brian ;
Kasela, Silva ;
Kashin, Seva ;
Kim-Hellmuth, Sarah ;
Kwong, Alan ;
Lappalainen, Tuuli ;
Li, Xiao ;
Liang, Yanyu ;
MacArthur, Daniel G. ;
Mohammadi, Pejman ;
Montgomery, Stephen B. ;
Munoz-Aguirre, Manuel ;
Rouhana, John M. ;
Hormozdiari, Farhad ;
Im, Hae Kyung .
SCIENCE, 2020, 369 (6509) :1318-1330
[3]   Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning [J].
Alipanahi, Babak ;
Delong, Andrew ;
Weirauch, Matthew T. ;
Frey, Brendan J. .
NATURE BIOTECHNOLOGY, 2015, 33 (08) :831-+
[4]   The Kipoi repository accelerates community exchange and reuse of predictive models for genomics [J].
Avsec, Ziga ;
Kreuzhuber, Roman ;
Israeli, Johnny ;
Xu, Nancy ;
Cheng, Jun ;
Shrikumar, Avanti ;
Banerjee, Abhimanyu ;
Kim, Daniel S. ;
Beier, Thorsten ;
Urban, Lara ;
Kundaje, Anshul ;
Stegle, Oliver ;
Gagneur, Julien .
NATURE BIOTECHNOLOGY, 2019, 37 (06) :592-600
[5]  
Brown TB, 2020, ADV NEUR IN, V33
[6]  
Dai ZH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2978
[7]   Deep learning: new computational modelling techniques for genomics [J].
Eraslan, Gokcen ;
Avsec, Ziga ;
Gagneur, Julien ;
Theis, Fabian J. .
NATURE REVIEWS GENETICS, 2019, 20 (07) :389-403
[8]   A promoter-level mammalian expression atlas [J].
Forrest, Alistair R. R. ;
Kawaji, Hideya ;
Rehli, Michael ;
Baillie, J. Kenneth ;
de Hoon, Michiel J. L. ;
Haberle, Vanja ;
Lassmann, Timo ;
Kulakovskiy, Ivan V. ;
Lizio, Marina ;
Itoh, Masayoshi ;
Andersson, Robin ;
Mungall, Christopher J. ;
Meehan, Terrence F. ;
Schmeier, Sebastian ;
Bertin, Nicolas ;
Jorgensen, Mette ;
Dimont, Emmanuel ;
Arner, Erik ;
Schmidl, Christian ;
Schaefer, Ulf ;
Medvedeva, Yulia A. ;
Plessy, Charles ;
Vitezic, Morana ;
Severin, Jessica ;
Semple, Colin A. ;
Ishizu, Yuri ;
Young, Robert S. ;
Francescatto, Margherita ;
Alam, Intikhab ;
Albanese, Davide ;
Altschuler, Gabriel M. ;
Arakawa, Takahiro ;
Archer, John A. C. ;
Arner, Peter ;
Babina, Magda ;
Rennie, Sarah ;
Balwierz, Piotr J. ;
Beckhouse, Anthony G. ;
Pradhan-Bhatt, Swati ;
Blake, Judith A. ;
Blumenthal, Antje ;
Bodega, Beatrice ;
Bonetti, Alessandro ;
Briggs, James ;
Brombacher, Frank ;
Burroughs, A. Maxwell ;
Califano, Andrea ;
Cannistraci, Carlo V. ;
Carbajo, Daniel ;
Chen, Yun .
NATURE, 2014, 507 (7493) :462-+
[9]   Predicting 3D genome folding from DNA sequence with Akita [J].
Fudenberg, Geoff ;
Kelley, David R. ;
Pollard, Katherine S. .
NATURE METHODS, 2020, 17 (11) :1111-+
[10]   Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations [J].
Fulco, Charles P. ;
Nasser, Joseph ;
Jones, Thouis R. ;
Munson, Glen ;
Bergman, Drew T. ;
Subramanian, Vidya ;
Grossman, Sharon R. ;
Anyoha, Rockwell ;
Doughty, Benjamin R. ;
Patwardhan, Tejal A. ;
Nguyen, Tung H. ;
Kane, Michael ;
Perez, Elizabeth M. ;
Durand, Neva C. ;
Lareau, Caleb A. ;
Stamenova, Elena K. ;
Aiden, Erez Lieberman ;
Lander, Eric S. ;
Engreitz, Jesse M. .
NATURE GENETICS, 2019, 51 (12) :1664-+