Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction

被引:1
|
作者
Zang, Xuan [1 ]
Zhang, Junjie [1 ]
Tang, Buzhou [2 ,3 ]
机构
[1] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen 518000, Peoples R China
[3] Peng Cheng Lab, Shenzhen 518066, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Task analysis; Drugs; Graph neural networks; Representation learning; Perturbation methods; Message passing; Data mining; Drug analysis; graph neural networks; molecular property prediction; molecular pre-training;
D O I
10.1109/JBHI.2024.3422488
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Molecular representation learning has remarkably accelerated the development of drug analysis and discovery. It implements machine learning methods to encode molecule embeddings for diverse downstream drug-related tasks. Due to the scarcity of labeled molecular data, self-supervised molecular pre-training is promising as it can handle large-scale unlabeled molecular data to prompt representation learning. Although many universal graph pre-training methods have been successfully introduced into molecular learning, there still exist some limitations. Many graph augmentation methods, such as atom deletion and bond perturbation, tend to destroy the intrinsic properties and connections of molecules. In addition, identifying subgraphs that are important to specific chemical properties is also challenging for molecular learning. To address these limitations, we propose the self-supervised Molecular Graph Information Bottleneck (MGIB) model for molecular pre-training. MGIB observes molecular graphs from the atom view and the motif view, deploys a learnable graph compression process to extract the core subgraphs, and extends the graph information bottleneck into the self-supervised molecular pre-training framework. Model analysis validates the contribution of the self-supervised graph information bottleneck and illustrates the interpretability of MGIB through the extracted subgraphs. Extensive experiments involving molecular property prediction, including 7 binary classification tasks and 6 regression tasks demonstrate the effectiveness and superiority of our proposed MGIB.
引用
收藏
页码:7659 / 7669
页数:11
相关论文
共 50 条
  • [31] Self-Supervised Graph Attention Networks for Deep Weighted Multi-View Clustering
    Huang, Zongmo
    Ren, Yazhou
    Pu, Xiaorong
    Huang, Shudong
    Xu, Zenglin
    He, Lifang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 7936 - 7943
  • [32] Self-supervised graph neural network with pre-training generative learning for recommendation systems
    Min, Xin
    Li, Wei
    Yang, Jinzhao
    Xie, Weidong
    Zhao, Dazhe
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [33] The Effectiveness of Self-supervised Pre-training for Multi-modal Endometriosis Classification
    Butler, David
    Wang, Hu
    Zhang, Yuan
    To, Minh-Son
    Condous, George
    Leonardi, Mathew
    Knox, Steven
    Avery, Jodie
    Hull, M. Louise
    Carneiro, Gustavo
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [34] Self-supervised graph neural network with pre-training generative learning for recommendation systems
    Xin Min
    Wei Li
    Jinzhao Yang
    Weidong Xie
    Dazhe Zhao
    Scientific Reports, 12
  • [35] A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection
    Qiao, Yuhan
    Cui, Chaoqun
    Wang, Yiying
    Jia, Caiyan
    NEUROCOMPUTING, 2024, 604
  • [36] Deep learning based on self-supervised pre-training: Application on sandstone content prediction
    Wang, Chong Ming
    Wang, Xing Jian
    Chen, Yang
    Wen, Xue Mei
    Zhang, Yong Heng
    Li, Qing Wu
    FRONTIERS IN EARTH SCIENCE, 2023, 10
  • [37] Spatiotemporal self-supervised pre-training on satellite imagery improves food insecurity prediction
    Cartuyvels, Ruben
    Fierens, Tom
    Coppieters, Emiel
    Moens, Marie-Francine
    Sileo, Damien
    ENVIRONMENTAL DATA SCIENCE, 2023, 2
  • [38] Incomplete multi-view clustering based on information fusion with self-supervised learning
    Cai, Yilong
    Shu, Qianyu
    Zhou, Zhengchun
    Meng, Hua
    INFORMATION FUSION, 2025, 117
  • [39] GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
    Tian, Xiaoyu
    Ran, Haoxi
    Wang, Yue
    Zhao, Hang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13570 - 13580
  • [40] Incomplete multi-view clustering based on information fusion with self-supervised learning
    Cai, Yilong
    Shu, Qianyu
    Zhou, Zhengchun
    Meng, Hua
    Information Fusion,