VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder

被引:27
|
作者
Samanta, Soumitra [1 ]
O'Hagan, Steve [2 ,4 ]
Swainston, Neil [1 ]
Roberts, Timothy J. [1 ]
Kell, Douglas B. [1 ,3 ]
机构
[1] Univ Liverpool, Inst Syst Mol & Integrat Biol, Dept Biochem & Syst Biol, Crown St, Liverpool L69 7ZB, Merseyside, England
[2] Univ Manchester, Manchester Inst Biotechnol, Dept Chem, 131 Princess St, Manchester M1 7DN, Lancs, England
[3] Tech Univ Denmark, Novo Nordisk Fdn Ctr Biosustainabil, Bldg 220, DK-2800 Lyngby, Denmark
[4] Univ Coll London Hosp NHS Fdn Trust, 250 Euston Rd, London NW1 2PB, England
来源
MOLECULES | 2020年 / 25卷 / 15期
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
cheminformatics; molecular similarity; deep learning; variational autoencoder; SMILES; PYROLYSIS MASS-SPECTROMETRY; DRUG DISCOVERY; MARKETED DRUGS; DESIGN; DESCRIPTORS; FINGERPRINTS; NETWORKS; REPRESENTATION; PROMISCUITY; FOUNDATIONS;
D O I
10.3390/molecules25153446
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Molecular similarity is an elusive but core "unsupervised" cheminformatics concept, yet different "fingerprint" encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are "better" than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z vertical bar x) where z is a latent vector and x are the (same) input/output data. It takes the form of a "bowtie"-shaped artificial neural network. In the middle is a "bottleneck layer" or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] GF-VAE: A Flow-based Variational Autoencoder for Molecule Generation
    Ma, Changsheng
    Zhang, Xiangliang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1181 - 1190
  • [2] SC-VAE: Sparse coding-based variational autoencoder with learned ISTA
    Xiao, Pan
    Qiu, Peijie
    Ha, Sung Min
    Bani, Abdalla
    Zhou, Shuang
    Sotiras, Aristeidis
    PATTERN RECOGNITION, 2025, 161
  • [3] VAE*: A Novel Variational Autoencoder via Revisiting Positive and Negative Samples for Top-N Recommendation
    Liu, Wei
    Hou, U. Leong
    Liang, Shangsong
    Zhu, Huaijie
    Yu, Jianxing
    Liu, Yubao
    Yin, Jian
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
  • [4] Vessel Trajectory Similarity Measure Based on Deep Convolutional Autoencoder
    Li, Shichen
    Liang, Maohan
    Liu, Ryan Wen
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 333 - 338
  • [5] Multi-Channel Multi-Scale Convolution Attention Variational Autoencoder (MCA-VAE): An Interpretable Anomaly Detection Algorithm Based on Variational Autoencoder
    Liu, Jingwen
    Huang, Yuchen
    Wu, Dizhi
    Yang, Yuchen
    Chen, Yanru
    Chen, Liangyin
    Zhang, Yuanyuan
    SENSORS, 2024, 24 (16)
  • [6] A Variational Autoencoder-General Adversarial Networks (VAE-GAN) Based Model for Ligand Designing
    Mukesh, K.
    Venkata, Srisurya Ippatapu
    Chereddy, Spandana
    Anbazhagan, E.
    Oviya, I. R.
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1, 2023, 473 : 761 - 768
  • [7] Grad2VAE: An Explainable Variational Autoencoder Model Based on Online Attentions Preserving Curvatures of Representations
    Abukmeil, Mohanad
    Ferrari, Stefano
    Genovese, Angelo
    Piuri, Vincenzo
    Scotti, Fabio
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 670 - 681
  • [8] IE-VAE: A Deep Learning Method for Solving Electromagnetic Inverse Scattering Problems Based on Variational Autoencoder
    Wang, Yan
    Hu, Shuangxia
    Zhao, Linlin
    Li, Jinhong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14865 : 386 - 397
  • [9] A novel process monitoring approach based on variational recurrent autoencoder
    Cheng, Feifan
    He, Q. Peter
    Zhao, Jinsong
    COMPUTERS & CHEMICAL ENGINEERING, 2019, 129
  • [10] A Novel similarity measure based on eigenvalue distribution
    Huang, Xu
    Ghodsi, Mansi
    Hassani, Hossein
    TRANSACTIONS OF A RAZMADZE MATHEMATICAL INSTITUTE, 2016, 170 (03) : 352 - 362