Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引:1
|
作者
Zang, Xuan [1 ]
Zhang, Junjie [1 ]
Tang, Buzhou [2 ,3 ]
机构
[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;
D O I
10.1109/JBHI.2024.3422488
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.
引用
收藏
页码:7659 / 7669
页数:11
相关论文
共 50 条
  • [1] Self-Supervised pre-training model based on Multi-view for MOOC Recommendation
    Tian, Runyu
    Cai, Juanjuan
    Li, Chuanzhen
    Wang, Jingling
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [2] Self-Supervised Information Bottleneck for Deep Multi-View Subspace Clustering
    Wang, Shiye
    Li, Changsheng
    Li, Yanming
    Yuan, Ye
    Wang, Guoren
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 (1555-1567) : 1555 - 1567
  • [3] Single-atom catalysts property prediction via Supervised and Self-Supervised pre-training models
    Wang, Lanjing
    Chen, Honghao
    Yang, Longqi
    Li, Jiali
    Li, Yong
    Wang, Xiaonan
    CHEMICAL ENGINEERING JOURNAL, 2024, 487
  • [4] Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training
    Wang, Jian
    Yang, Xin
    Jia, Xiaohong
    Xue, Wufeng
    Chen, Rusi
    Chen, Yanlin
    Zhu, Xiliang
    Liu, Lian
    Cao, Yan
    Zhou, Jianqiao
    Ni, Dong
    Gu, Ning
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 171
  • [5] MVEB: Self-Supervised Learning With Multi-View Entropy Bottleneck
    Wen, Liangjian
    Wang, Xiasi
    Liu, Jianzhuang
    Xu, Zenglin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6097 - 6108
  • [6] Multi-view Self-supervised Heterogeneous Graph Embedding
    Zhao, Jianan
    Wen, Qianlong
    Sun, Shiyu
    Ye, Yanfang
    Zhang, Chuxu
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II, 2021, 12976 : 319 - 334
  • [7] Self-Supervised Graph Representation Learning via Information Bottleneck
    Gu, Junhua
    Zheng, Zichen
    Zhou, Wenmiao
    Zhang, Yajuan
    Lu, Zhengjun
    Yang, Liang
    SYMMETRY-BASEL, 2022, 14 (04):
  • [8] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [9] Masked Feature Prediction for Self-Supervised Visual Pre-Training
    Wei, Chen
    Fan, Haoqi
    Xie, Saining
    Wu, Chao-Yuan
    Yuille, Alan
    Feichtenhofer, Christoph
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
  • [10] Self-Supervised Graph Convolutional Network for Multi-View Clustering
    Xia, Wei
    Wang, Qianqian
    Gao, Quanxue
    Zhang, Xiangdong
    Gao, Xinbo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3182 - 3192