VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder

被引:27
|
作者
Samanta, Soumitra [1 ]
O'Hagan, Steve [2 ,4 ]
Swainston, Neil [1 ]
Roberts, Timothy J. [1 ]
Kell, Douglas B. [1 ,3 ]
机构
[1] Univ Liverpool, Inst Syst Mol & Integrat Biol, Dept Biochem & Syst Biol, Crown St, Liverpool L69 7ZB, Merseyside, England
[2] Univ Manchester, Manchester Inst Biotechnol, Dept Chem, 131 Princess St, Manchester M1 7DN, Lancs, England
[3] Tech Univ Denmark, Novo Nordisk Fdn Ctr Biosustainabil, Bldg 220, DK-2800 Lyngby, Denmark
[4] Univ Coll London Hosp NHS Fdn Trust, 250 Euston Rd, London NW1 2PB, England
来源
MOLECULES | 2020年 / 25卷 / 15期
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
cheminformatics; molecular similarity; deep learning; variational autoencoder; SMILES; PYROLYSIS MASS-SPECTROMETRY; DRUG DISCOVERY; MARKETED DRUGS; DESIGN; DESCRIPTORS; FINGERPRINTS; NETWORKS; REPRESENTATION; PROMISCUITY; FOUNDATIONS;
D O I
10.3390/molecules25153446
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Molecular similarity is an elusive but core "unsupervised" cheminformatics concept, yet different "fingerprint" encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are "better" than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z vertical bar x) where z is a latent vector and x are the (same) input/output data. It takes the form of a "bowtie"-shaped artificial neural network. In the middle is a "bottleneck layer" or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations on single cell data
    Kopf, Andreas
    Fortuin, Vincent
    Somnath, Vignesh Ram
    Claassen, Manfred
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (06)
  • [42] A novel document similarity measure based on earth mover's distance
    Wan, Xiaojun
    INFORMATION SCIENCES, 2007, 177 (18) : 3718 - 3730
  • [43] A Novel Similarity Measure Approach for Time Series based on PLA and DTW
    Shen Jingyi
    Zhu Dongyang
    Huang Weiping
    Liang Jun
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7159 - 7163
  • [44] A Novel Evidence-Based Bayesian Similarity Measure for Recommender Systems
    Guo, Guibing
    Zhang, Jie
    Yorke-Smith, Neil
    ACM TRANSACTIONS ON THE WEB, 2016, 10 (02)
  • [45] A novel sentence similarity measure for semantic-based expert systems
    Lee, Ming Che
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 6392 - 6399
  • [46] A Novel Method of Validation of the QSPRs Based on Molecular Similarity
    Hrubaru, Madalina
    Tarko, Laszlo
    REVISTA DE CHIMIE, 2019, 70 (03): : 887 - 901
  • [47] S3-VAE: A novel Supervised-Source-Separation Variational AutoEncoder algorithm to discriminate tumor cell lines in time-lapse microscopy images
    Casti, P.
    Cardarelli, S.
    Comes, M. C.
    D'Orazio, M.
    Filippi, J.
    Antonelli, G.
    Mencattini, A.
    Di Natale, C.
    Martinelli, E.
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 232
  • [48] Correction to: Extensive framework based on novel convolutional and variational autoencoder based on maximization of mutual information for anomaly detection
    Qien Yu
    Muthu Subash Kavitha
    Takio Kurita
    Neural Computing and Applications, 2022, 34 : 821 - 821
  • [49] SA-VAE: a novel approach for reservoir characterization based on variational auto-encoder and selective attention mechanism
    Dajie Chen
    Qiyu Chen
    Zhesi Cui
    Ruyi Wang
    Gang Liu
    Earth Science Informatics, 2023, 16 : 3283 - 3301
  • [50] SA-VAE: a novel approach for reservoir characterization based on variational auto-encoder and selective attention mechanism
    Chen, Dajie
    Chen, Qiyu
    Cui, Zhesi
    Wang, Ruyi
    Liu, Gang
    EARTH SCIENCE INFORMATICS, 2023, 16 (04) : 3283 - 3301