3DSMILES-GPT: 3D molecular pocket-based generation with token-only large language model

被引:0
|
作者
Wang, Jike [1 ]
Luo, Hao [1 ]
Qin, Rui [1 ]
Wang, Mingyang [1 ]
Wan, Xiaozhe [2 ]
Fang, Meijing [1 ]
Zhang, Odin [1 ]
Gou, Qiaolin [1 ]
Su, Qun [1 ]
Shen, Chao [1 ]
You, Ziyi [1 ]
Liu, Liwei [2 ]
Hsieh, Chang-Yu [1 ]
Hou, Tingjun [1 ]
Kang, Yu [1 ]
机构
[1] Zhejiang Univ, Coll Pharmaceut Sci, Hangzhou 310058, Zhejiang, Peoples R China
[2] Huawei Technol Co Ltd, Cent Res Inst, Adv Comp & Storage Lab, Lab 2012, Nanjing 210000, Jiangsu, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
EFFICIENT; LIBRARY;
D O I
10.1039/d4sc06864e
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively. We treat both two-dimensional (2D) and 3D molecular representations as linguistic expressions, combining them through full-dimensional representations and pre-training the model on a vast dataset encompassing tens of millions of drug-like molecules. This token-only approach enables the model to comprehensively understand the 2D and 3D characteristics of large-scale molecules. Subsequently, we fine-tune the model using pair-wise structural data of protein pockets and molecules, followed by reinforcement learning to further optimize the biophysical and chemical properties of the generated molecules. Experimental results demonstrate that 3DSMILES-GPT generates molecules that comprehensively outperform existing methods in terms of binding affinity, drug-likeness (QED), and synthetic accessibility score (SAS). Notably, it achieves a 33% enhancement in the quantitative estimation of QED, meanwhile the binding affinity estimated by Vina docking maintaining its state-of-the-art performance. The generation speed is remarkably fast, with the average time approximately 0.45 seconds per generation, representing a threefold increase over the fastest existing methods. This innovative 3DSMILES-GPT approach has the potential to positively impact the generation of 3D molecules in drug discovery.
引用
收藏
页码:637 / 648
页数:12
相关论文
共 50 条
  • [1] How Good are Current Pocket-Based 3D Generative Models?: The Benchmark Set and Evaluation of Protein Pocket-Based 3D Molecular Generative Models
    Liu, Haoyang
    Qin, Yifei
    Niu, Zhangming
    Xu, Mingyuan
    Wu, Jiaqiang
    Xiao, Xianglu
    Lei, Jinping
    Ran, Ting
    Chen, Hongming
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (24) : 9260 - 9275
  • [2] A pocket-based 3D molecule generative model fueled by experimental electron density
    Wang, Lvwei
    Bai, Rong
    Shi, Xiaoxuan
    Zhang, Wei
    Cui, Yinuo
    Wang, Xiaoman
    Wang, Cheng
    Chang, Haoyu
    Zhang, Yingsheng
    Zhou, Jielong
    Peng, Wei
    Zhou, Wenbiao
    Huang, Bo
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [3] A pocket-based 3D molecule generative model fueled by experimental electron density
    Lvwei Wang
    Rong Bai
    Xiaoxuan Shi
    Wei Zhang
    Yinuo Cui
    Xiaoman Wang
    Cheng Wang
    Haoyu Chang
    Yingsheng Zhang
    Jielong Zhou
    Wei Peng
    Wenbiao Zhou
    Bo Huang
    Scientific Reports, 12
  • [4] ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling
    Odin Zhang
    Jintu Zhang
    Jieyu Jin
    Xujun Zhang
    RenLing Hu
    Chao Shen
    Hanqun Cao
    Hongyan Du
    Yu Kang
    Yafeng Deng
    Furui Liu
    Guangyong Chen
    Chang-Yu Hsieh
    Tingjun Hou
    Nature Machine Intelligence, 2023, 5 : 1020 - 1030
  • [5] ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling
    Zhang, Odin
    Zhang, Jintu
    Jin, Jieyu
    Zhang, Xujun
    Hu, Renling
    Shen, Chao
    Cao, Hanqun
    Du, Hongyan
    Kang, Yu
    Deng, Yafeng
    Liu, Furui
    Chen, Guangyong
    Hsieh, Chang-Yu
    Hou, Tingjun
    NATURE MACHINE INTELLIGENCE, 2023, 5 (09) : 1020 - 1030
  • [6] PIDiff: Physics informed diffusion model for protein pocket-specific 3D molecular generation
    Choi, Seungyeon
    Seo, Sangmin
    Kim, Byung Ju
    Park, Chihyun
    Park, Sanghyun
    Computers in Biology and Medicine, 2024, 180
  • [7] Generation of 3D molecules in pockets via a language model
    Wei Feng
    Lvwei Wang
    Zaiyun Lin
    Yanhao Zhu
    Han Wang
    Jianqiang Dong
    Rong Bai
    Huting Wang
    Jielong Zhou
    Wei Peng
    Bo Huang
    Wenbiao Zhou
    Nature Machine Intelligence, 2024, 6 : 62 - 73
  • [8] Generation of 3D molecules in pockets via a language model
    Feng, Wei
    Wang, Lvwei
    Lin, Zaiyun
    Zhu, Yanhao
    Wang, Han
    Dong, Jianqiang
    Bai, Rong
    Wang, Huting
    Zhou, Jielong
    Peng, Wei
    Huang, Bo
    Zhou, Wenbiao
    NATURE MACHINE INTELLIGENCE, 2024, 6 (01) : 62 - 73
  • [9] 3D Building Generation in Minecraft via Large Language Models
    Hu, Shiying
    Huang, Zengrong
    Hu, Chengpeng
    Liu, Jialin
    2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
  • [10] MDM: Molecular Diffusion Model for 3D Molecule Generation
    Huang, Lei
    Zhang, Hengtong
    Xu, Tingyang
    Wong, Ka-Chun
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 4, 2023, : 5105 - 5112