A predictive language model for SARS-CoV-2 evolution

被引:0
|
作者
Ma, Enhao [1 ]
Guo, Xuan [1 ,2 ]
Hu, Mingda [3 ]
Wang, Penghua [4 ]
Wang, Xin [3 ]
Wei, Congwen [3 ]
Cheng, Gong [1 ,2 ]
机构
[1] Tsinghua Univ, Sch Basic Med Sci, 30 Shuangqing Rd, Beijing 100084, Peoples R China
[2] Inst Infect Dis, Shenzhen Bay Lab, Guangqiao Rd, Shenzhen 518000, Guangdong, Peoples R China
[3] Beijing Inst Biotechnol, 20 Dongdajie, Beijing 100071, Peoples R China
[4] Univ Connecticut Hlth Ctr, Sch Med, Dept Immunol, Farmington, CT 06030 USA
基金
中国国家自然科学基金;
关键词
EVASION;
D O I
10.1038/s41392-024-02066-x
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Modeling and predicting mutations are critical for COVID-19 and similar pandemic preparedness. However, existing predictive models have yet to integrate the regularity and randomness of viral mutations with minimal data requirements. Here, we develop a non-demanding language model utilizing both regularity and randomness to predict candidate SARS-CoV-2 variants and mutations that might prevail. We constructed the "grammatical frameworks" of the available S1 sequences for dimension reduction and semantic representation to grasp the model's latent regularity. The mutational profile, defined as the frequency of mutations, was introduced into the model to incorporate randomness. With this model, we successfully identified and validated several variants with significantly enhanced viral infectivity and immune evasion by wet-lab experiments. By inputting the sequence data from three different time points, we detected circulating strains or vital mutations for XBB.1.16, EG.5, JN.1, and BA.2.86 strains before their emergence. In addition, our results also predicted the previously unknown variants that may cause future epidemics. With both the data validation and experiment evidence, our study represents a fast-responding, concise, and promising language model, potentially generalizable to other viral pathogens, to forecast viral evolution and detect crucial hot mutation spots, thus warning the emerging variants that might raise public health concern.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] SARS-CoV-2 Alchemy: Understanding the dynamics of age, vaccination, and geography in the evolution of SARS-CoV-2 in India
    Patel, Mansi
    Shamim, Uzma
    Umang, Umang
    Pandey, Rajesh
    Narayan, Jitendra
    PLOS NEGLECTED TROPICAL DISEASES, 2025, 19 (03):
  • [32] Language models for the prediction of SARS-CoV-2 inhibitors
    Blanchard, Andrew E.
    Gounley, John
    Bhowmik, Debsindhu
    Shekar, Mayanka Chandra
    Lyngaas, Isaac
    Gao, Shang
    Yin, Junqi
    Tsaris, Aristeidis
    Wang, Feiyi
    Glaser, Jens
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2022, 36 (5-6): : 587 - 602
  • [33] SARS-CoV-2 also attacks the French language?
    Andres, E.
    Pessaux, P.
    REVUE DE MEDECINE INTERNE, 2020, 41 (09): : 644 - 644
  • [34] Assessment of the predictive value of plasma calprotectin in the evolution of SARS-Cov-2 primo-infection
    Lignier, Gauthier
    Camare, Caroline
    Jamme, Thibaut
    Combis, Marie -Sophie
    Tayac, Didier
    Maupas-Schwalm, Francoise
    INFECTIOUS DISEASES NOW, 2024, 54 (02):
  • [35] SARS-CoV-2 Evolution: Immune Dynamics, Omicron Specificity, and Predictive Modeling in Vaccinated Populations
    Zhang, Xiaohan
    Li, Mansheng
    Zhang, Nana
    Li, Yunhui
    Teng, Fei
    Li, Yongzhe
    Zhang, Xiaomei
    Xu, Xingming
    Li, Haolong
    Zhu, Yunping
    Wang, Yumin
    Jia, Yan
    Qin, Chengfeng
    Wang, Bingwei
    Guo, Shubin
    Wang, Yajie
    Yu, Xiaobo
    ADVANCED SCIENCE, 2024, 11 (40)
  • [36] PredictION: a predictive model to establish the performance of Oxford sequencing reads of SARS-CoV-2
    Valencia-Valencia, David E.
    Lopez-Alvarez, Diana
    Rivera-Franco, Nelson
    Castillo, Andres
    Pina, Johan S.
    Pardo, Carlos A.
    Parra, Beatriz
    PEERJ, 2022, 10
  • [37] No evidence for distinct types in the evolution of SARS-CoV-2
    MacLean, Oscar A.
    Orton, Richard J.
    Singer, Joshua B.
    Robertson, David L.
    VIRUS EVOLUTION, 2020, 6 (01)
  • [38] On the evolution of SARS-CoV-2 and the emergence of variants of concern
    Magiorkinis, Gkikas
    TRENDS IN MICROBIOLOGY, 2023, 31 (01) : 5 - 8
  • [39] Persistence and Evolution of SARS-CoV-2 in an Immunocompromised Host
    Choi, Bina
    Choudhary, Manish C.
    Regan, James
    Sparks, Jeffrey A.
    Padera, Robert F.
    Qiu, Xueting
    Solomon, Isaac H.
    Kuo, Hsiao-Hsuan
    Boucau, Julie
    Bowman, Kathryn
    Das Adhikari, U.
    Winkler, Marisa L.
    Mueller, Alisa A.
    Hsu, Tiffany Y. -T.
    Desjardins, Michael
    Baden, Lindsey R.
    Chan, Brian T.
    Walker, Bruce D.
    Lichterfeld, Mathias
    Brigl, Manfred
    Kwon, Douglas S.
    Kanjilal, Sanjat
    Richardson, Eugene T.
    Jonsson, A. Helena
    Alter, Galit
    Barczak, Amy K.
    Hanage, William P.
    Yu, Xu G.
    Gaiha, Gaurav D.
    Seaman, Michael S.
    Cernadas, Manuela
    Li, Jonathan Z.
    NEW ENGLAND JOURNAL OF MEDICINE, 2020, 383 (23): : 2291 - 2293
  • [40] SARS-CoV-2: tracing the origin, tracking the evolution
    Voskarides, Konstantinos
    BMC MEDICAL GENOMICS, 2022, 15 (01)