VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder

被引：27

作者：

Samanta, Soumitra ^{[1
]}

O'Hagan, Steve ^{[2
,4
]}

Swainston, Neil ^{[1
]}

Roberts, Timothy J. ^{[1
]}

Kell, Douglas B. ^{[1
,3
]}

机构：

[1] Univ Liverpool, Inst Syst Mol & Integrat Biol, Dept Biochem & Syst Biol, Crown St, Liverpool L69 7ZB, Merseyside, England

[2] Univ Manchester, Manchester Inst Biotechnol, Dept Chem, 131 Princess St, Manchester M1 7DN, Lancs, England

[3] Tech Univ Denmark, Novo Nordisk Fdn Ctr Biosustainabil, Bldg 220, DK-2800 Lyngby, Denmark

[4] Univ Coll London Hosp NHS Fdn Trust, 250 Euston Rd, London NW1 2PB, England

来源：

MOLECULES | 2020年 / 25卷 / 15期

基金：

英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;

关键词：

cheminformatics; molecular similarity; deep learning; variational autoencoder; SMILES; PYROLYSIS MASS-SPECTROMETRY; DRUG DISCOVERY; MARKETED DRUGS; DESIGN; DESCRIPTORS; FINGERPRINTS; NETWORKS; REPRESENTATION; PROMISCUITY; FOUNDATIONS;

D O I：

10.3390/molecules25153446

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Molecular similarity is an elusive but core "unsupervised" cheminformatics concept, yet different "fingerprint" encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are "better" than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z vertical bar x) where z is a latent vector and x are the (same) input/output data. It takes the form of a "bowtie"-shaped artificial neural network. In the middle is a "bottleneck layer" or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.

引用

页数：16

共 50 条

[1] GF-VAE: A Flow-based Variational Autoencoder for Molecule Generation
Ma, Changsheng
Zhang, Xiangliang
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1181 - 1190
[2] SC-VAE: Sparse coding-based variational autoencoder with learned ISTA
Xiao, Pan
Qiu, Peijie
Ha, Sung Min
Bani, Abdalla
Zhou, Shuang
Sotiras, Aristeidis
PATTERN RECOGNITION, 2025, 161
[3] VAE*: A Novel Variational Autoencoder via Revisiting Positive and Negative Samples for Top-N Recommendation
Liu, Wei
Hou, U. Leong
Liang, Shangsong
Zhu, Huaijie
Yu, Jianxing
Liu, Yubao
Yin, Jian
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (09)
[4] Vessel Trajectory Similarity Measure Based on Deep Convolutional Autoencoder
Li, Shichen
Liang, Maohan
Liu, Ryan Wen
2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 333 - 338
[5] Multi-Channel Multi-Scale Convolution Attention Variational Autoencoder (MCA-VAE): An Interpretable Anomaly Detection Algorithm Based on Variational Autoencoder
Liu, Jingwen
Huang, Yuchen
Wu, Dizhi
Yang, Yuchen
Chen, Yanru
Chen, Liangyin
Zhang, Yuanyuan
SENSORS, 2024, 24 (16)
[6] A Variational Autoencoder-General Adversarial Networks (VAE-GAN) Based Model for Ligand Designing
Mukesh, K.
Venkata, Srisurya Ippatapu
Chereddy, Spandana
Anbazhagan, E.
Oviya, I. R.
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1, 2023, 473 : 761 - 768
[7] Grad2VAE: An Explainable Variational Autoencoder Model Based on Online Attentions Preserving Curvatures of Representations
Abukmeil, Mohanad
Ferrari, Stefano
Genovese, Angelo
Piuri, Vincenzo
Scotti, Fabio
IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 670 - 681
[8] IE-VAE: A Deep Learning Method for Solving Electromagnetic Inverse Scattering Problems Based on Variational Autoencoder
Wang, Yan
Hu, Shuangxia
Zhao, Linlin
Li, Jinhong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14865 : 386 - 397
[9] A novel process monitoring approach based on variational recurrent autoencoder
Cheng, Feifan
He, Q. Peter
Zhao, Jinsong
COMPUTERS & CHEMICAL ENGINEERING, 2019, 129
[10] A Novel similarity measure based on eigenvalue distribution
Huang, Xu
Ghodsi, Mansi
Hassani, Hossein
TRANSACTIONS OF A RAZMADZE MATHEMATICAL INSTITUTE, 2016, 170 (03) : 352 - 362

← 1 2 3 4 5 →