A foundation model of transcription across human cell types

被引:1
|
作者
Fu, Xi [1 ,2 ]
Mo, Shentong [3 ,4 ]
Buendia, Alejandro [1 ]
Laurent, Anouchka P. [5 ]
Shao, Anqi [6 ]
Alvarez-Torres, Maria del Mar [1 ]
Yu, Tianji [1 ]
Tan, Jimin [7 ]
Su, Jiayu [1 ]
Sagatelian, Romella [1 ]
Ferrando, Adolfo A. [5 ,8 ]
Ciccia, Alberto [9 ]
Lan, Yanyan [10 ,11 ]
Owens, David M. [6 ,12 ]
Palomero, Teresa [5 ,12 ]
Xing, Eric P. [3 ,4 ]
Rabadan, Raul [1 ,2 ]
机构
[1] Columbia Univ, Dept Syst Biol, Program Math Genom, New York, NY 10027 USA
[2] Columbia Univ, Dept Biomed Informat, New York, NY 10027 USA
[3] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[4] Carnegie Mellon Univ, Dept Machine Learning, Pittsburgh, PA 15213 USA
[5] Columbia Univ, Inst Canc Genet, New York, NY USA
[6] Columbia Univ, Dept Dermatol, New York, NY USA
[7] NYU, Inst Syst Genet, Grossman Sch Med, New York, NY USA
[8] Regeneron, Regeneron Genet Ctr, Tarrytown, NY USA
[9] Columbia Univ, Dept Genet & Dev, New York, NY USA
[10] Tsinghua Univ, Inst AI Ind Res, Beijing, Peoples R China
[11] Tsinghua Univ, Beijing Frontier Res Ctr Biol Struct, Beijing, Peoples R China
[12] Columbia Univ, Dept Pathol & Cell Biol, New York, NY USA
关键词
GENE-EXPRESSION; TARGET GENES; PAX5; METHYLATION; CHROMATIN; DNA;
D O I
10.1038/s41586-024-08391-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types1,2. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types3. GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovers universal and cell-type-specific transcription factor interaction networks. We evaluated its performance in prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors and found that it outperforms current models4 in predicting lentivirus-based massively parallel reporter assay readout5,6. In fetal erythroblasts7, we identified distal (greater than 1 Mbp) regulatory regions that were missed by previous models, and, in B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukaemia risk predisposing germline mutation8, 9-10. In sum, we provide a generalizable and accurate model for transcription together with catalogues of gene regulation and transcription factor interactions, all with cell type specificity.
引用
收藏
页码:965 / 973
页数:28
相关论文
共 50 条
  • [21] Abstract behavior types: A foundation model for components and their composition
    Arbab, F
    FORMAL METHODS FOR COMPONENTS AND OBJECTS, 2003, 2852 : 33 - 70
  • [22] Single-cell atlases: shared and tissue-specific cell types across human organs
    Elmentaite, Rasa
    Conde, Cecilia Dominguez
    Yang, Lu
    Teichmann, Sarah A.
    NATURE REVIEWS GENETICS, 2022, 23 (07) : 395 - 410
  • [23] Single-cell atlases: shared and tissue-specific cell types across human organs
    Rasa Elmentaite
    Cecilia Domínguez Conde
    Lu Yang
    Sarah A. Teichmann
    Nature Reviews Genetics, 2022, 23 (7) : 395 - 410
  • [24] Integration of high-resolution promoter profiling assays reveals novel, cell type-specific transcription start sites across 115 human cell and tissue types
    Moore, Jill E.
    Zhang, Xiao-Ou
    Elhajjajy, Shaimae, I
    Fan, Kaili
    Pratt, Henry E.
    Reese, Fairlie
    Mortazavi, Ali
    Weng, Zhiping
    GENOME RESEARCH, 2022, 32 (02) : 389 - 402
  • [25] Human box C/D snoRNA processing conservation across multiple cell types
    Scott, Michelle S.
    Ono, Motoharu
    Yamada, Kayo
    Endo, Akinori
    Barton, Geoffrey J.
    Lamond, Angus I.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (08) : 3676 - 3688
  • [26] Disrupted cooperation between transcription factors across diverse cancer types
    Jing Wang
    Qi Liu
    Jingchun Sun
    Yu Shyr
    BMC Genomics, 17
  • [27] Disrupted cooperation between transcription factors across diverse cancer types
    Wang, Jing
    Liu, Qi
    Sun, Jingchun
    Shyr, Yu
    BMC GENOMICS, 2016, 17
  • [28] A Biophysical Model Uncovers the Size Distribution of Migrating Cell Clusters across Cancer Types
    Bocci, Federico
    Jolly, Mohit Kumar
    Onuchic, Jose Nelson
    CANCER RESEARCH, 2019, 79 (21) : 5527 - 5535
  • [29] Model-based translation of DNA damage signaling dynamics across cell types
    Heldring, Muriel M.
    Wijaya, Lukas S.
    Niemeijer, Marije
    Yang, Huan
    Lakhal, Talel
    Le Devedec, Sylvia E.
    van de Water, Bob
    Beltman, Joost B.
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (07)
  • [30] Transcription Control in Human Cell Types by Systematic Analysis of ChIP Sequencing Data from the ENCODE
    Devailly, Guillaume
    Joshi, Anagha
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2017, PT II, 2017, 10209 : 315 - 324