A foundation model of transcription across human cell types

被引:1
|
作者
Fu, Xi [1 ,2 ]
Mo, Shentong [3 ,4 ]
Buendia, Alejandro [1 ]
Laurent, Anouchka P. [5 ]
Shao, Anqi [6 ]
Alvarez-Torres, Maria del Mar [1 ]
Yu, Tianji [1 ]
Tan, Jimin [7 ]
Su, Jiayu [1 ]
Sagatelian, Romella [1 ]
Ferrando, Adolfo A. [5 ,8 ]
Ciccia, Alberto [9 ]
Lan, Yanyan [10 ,11 ]
Owens, David M. [6 ,12 ]
Palomero, Teresa [5 ,12 ]
Xing, Eric P. [3 ,4 ]
Rabadan, Raul [1 ,2 ]
机构
[1] Columbia Univ, Dept Syst Biol, Program Math Genom, New York, NY 10027 USA
[2] Columbia Univ, Dept Biomed Informat, New York, NY 10027 USA
[3] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[4] Carnegie Mellon Univ, Dept Machine Learning, Pittsburgh, PA 15213 USA
[5] Columbia Univ, Inst Canc Genet, New York, NY USA
[6] Columbia Univ, Dept Dermatol, New York, NY USA
[7] NYU, Inst Syst Genet, Grossman Sch Med, New York, NY USA
[8] Regeneron, Regeneron Genet Ctr, Tarrytown, NY USA
[9] Columbia Univ, Dept Genet & Dev, New York, NY USA
[10] Tsinghua Univ, Inst AI Ind Res, Beijing, Peoples R China
[11] Tsinghua Univ, Beijing Frontier Res Ctr Biol Struct, Beijing, Peoples R China
[12] Columbia Univ, Dept Pathol & Cell Biol, New York, NY USA
关键词
GENE-EXPRESSION; TARGET GENES; PAX5; METHYLATION; CHROMATIN; DNA;
D O I
10.1038/s41586-024-08391-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types1,2. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types3. GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovers universal and cell-type-specific transcription factor interaction networks. We evaluated its performance in prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors and found that it outperforms current models4 in predicting lentivirus-based massively parallel reporter assay readout5,6. In fetal erythroblasts7, we identified distal (greater than 1 Mbp) regulatory regions that were missed by previous models, and, in B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukaemia risk predisposing germline mutation8, 9-10. In sum, we provide a generalizable and accurate model for transcription together with catalogues of gene regulation and transcription factor interactions, all with cell type specificity.
引用
收藏
页码:965 / 973
页数:28
相关论文
共 50 条
  • [31] Kinetics of transcription of human cytomegalovirus chemokine receptor US28 in different cell types
    Zipeto, D
    Bodaghi, B
    Laurent, L
    Virelizier, JL
    Michelson, S
    JOURNAL OF GENERAL VIROLOGY, 1999, 80 : 543 - 547
  • [32] Microtubule organization across cell types and states
    Sallee, Maria D.
    Feldman, Jessica L.
    CURRENT BIOLOGY, 2021, 31 (10) : R506 - R511
  • [33] Meta-analysis reveals conserved cell cycle transcriptional network across multiple human cell types
    Bruno Giotti
    Anagha Joshi
    Tom C. Freeman
    BMC Genomics, 18
  • [34] Meta-analysis reveals conserved cell cycle transcriptional network across multiple human cell types
    Giotti, Bruno
    Joshi, Anagha
    Freeman, Tom C.
    BMC GENOMICS, 2017, 18
  • [35] HUMAN RESOURCE PLANNING - FOUNDATION FOR A MODEL
    FLOWERS, VS
    CODA, BA
    PERSONNEL, 1974, 51 (01) : 20 - 42
  • [36] An atlas of cell-type-specific interactome networks across 44 human tumor types
    Li, Zekun
    Liu, Gerui
    Yang, Xiaoxiao
    Shu, Meng
    Jin, Wen
    Tong, Yang
    Liu, Xiaochuan
    Wang, Yuting
    Yuan, Jiapei
    Yang, Yang
    GENOME MEDICINE, 2024, 16 (01)
  • [37] Inferring CTCF-binding patterns and anchored loops across human tissues and cell types
    Xu, Hang
    Yi, Xianfu
    Fan, Xutong
    Wu, Chengyue
    Wang, Wei
    Chu, Xinlei
    Zhang, Shijie
    Dong, Xiaobao
    Wang, Zhao
    Wang, Jianhua
    Zhou, Yao
    Zhao, Ke
    Yao, Hongcheng
    Zheng, Nan
    Wang, Junwen
    Chen, Yupeng
    Plewczynski, Dariusz
    Sham, Pak Chung
    Chen, Kexin
    Huang, Dandan
    Jun, Mulin
    PATTERNS, 2023, 4 (08):
  • [38] Selective susceptibility to nanosecond pulsed electric field (nsPEF) across different human cell types
    Elena C. Gianulis
    Chantelle Labib
    Gintautas Saulis
    Vitalij Novickij
    Olga N. Pakhomova
    Andrei G. Pakhomov
    Cellular and Molecular Life Sciences, 2017, 74 : 1741 - 1754
  • [39] An atlas of cell-type-specific interactome networks across 44 human tumor types
    Zekun Li
    Gerui Liu
    Xiaoxiao Yang
    Meng Shu
    Wen Jin
    Yang Tong
    Xiaochuan Liu
    Yuting Wang
    Jiapei Yuan
    Yang Yang
    Genome Medicine, 16
  • [40] Selective susceptibility to nanosecond pulsed electric field (nsPEF) across different human cell types
    Gianulis, Elena C.
    Labib, Chantelle
    Saulis, Gintautas
    Novickij, Vitalij
    Pakhomova, Olga N.
    Pakhomov, Andrei G.
    CELLULAR AND MOLECULAR LIFE SCIENCES, 2017, 74 (09) : 1741 - 1754