Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis

被引:10
|
作者
Stienstra, Cailum M. K. [1 ]
Ieritano, Christian [1 ]
Haack, Alexander [1 ]
Hopkins, W. Scott [1 ,2 ,3 ]
机构
[1] Univ Waterloo, Dept Chem, Waterloo, ON N2L 3G1, Canada
[2] Watermine Innovat, Waterloo, ON N0B 2T0, Canada
[3] Ctr Eye & Vis Res, Hong Kong 999077, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
DRUG-DELIVERY REVIEWS; AQUEOUS SOLUBILITY; LIQUID-CHROMATOGRAPHY; PHYSICOCHEMICAL PROPERTIES; ATMOSPHERIC-PRESSURE; PREDICTION; IONS; MOLECULES;
D O I
10.1021/acs.analchem.3c00921
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Aqueous solubility, log S, and the water-octanolpartition coefficient, log P, are physicochemicalproperties that are used to screen the viability of drug candidatesand to estimate mass transport in the environment. In this work, differentialmobility spectrometry (DMS) experiments performed in microsolvatingenvironments are used to train machine learning (ML) frameworks thatpredict the log S and log P of variousmolecule classes. In lieu of a consistent source of experimentallymeasured log S and log P values,the OPERA package was used to evaluate the aqueous solubility andhydrophobicity of 333 analytes. With ion mobility/DMS data (e.g., CCS, dispersion curves) as input, we used ML regressorsand ensemble stacking to derive relationships with a high degree ofexplainability, as assessed via SHapley Additive exPlanations (SHAP)analysis. The DMS-based regression models returned scores of R (2) = 0.67 and RMSE = 1.03 & PLUSMN; 0.10 for log S predictions and R (2) = 0.67and RMSE = 1.20 & PLUSMN; 0.10 for log P after 5-foldrandom cross-validation. SHAP analysis reveals that the regressorsstrongly weighted gas-phase clustering in log P correlations.The addition of structural descriptors (e.g., # ofaromatic carbons) improved log S predictions to yieldRMSE = 0.84 & PLUSMN; 0.07 and R (2) = 0.78.Similarly, log P predictions using the same dataresulted in an RMSE of 0.83 & PLUSMN; 0.04 and R (2) = 0.84. The SHAP analysis of log P modelshighlights the need for additional experimental parameters describinghydrophobic interactions. These results were achieved with a smallerdataset (333 instances) and minimal structural correlation comparedto purely structure-based models, underscoring the value of employingDMS data in predictive models.
引用
收藏
页码:10309 / 10321
页数:13
相关论文
共 50 条
  • [1] Analysis of Network log data using Machine Learning
    Allagi, Shridhar
    Rachh, Rashmi
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [2] Log books for building - Bridging the gap between designers and operators
    Franklin, Bryan
    Jones, Phil
    Energy World, 2003, (313):
  • [3] Assisted Cement Log Interpretation Using Machine Learning
    Viggen, Erlend Magnus
    Singstad, Bjorn-Jostein
    Time, Eirik
    Mishra, Siddharth
    Berg, Eirik
    SPE DRILLING & COMPLETION, 2023, 38 (02) : 220 - 234
  • [4] Assisted Cement Log Interpretation Using Machine Learning
    Viggen E.M.
    Singstad B.-J.
    Time E.
    Mishra S.
    Berg E.
    SPE Drilling and Completion, 2023, 38 (02): : 220 - 234
  • [5] Machine Learning to Detect Anomalies in Web Log Analysis
    Cao, Qimin
    Qiao, Yinrong
    Lyu, Zhong
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 519 - 523
  • [6] Integrating interaction design and log analysis: Bridging the gap with UML, XML and XMI
    Muresan, Gheorghe
    JOURNAL OF WEB ENGINEERING, 2007, 6 (03): : 196 - 221
  • [7] The role of lifelong machine learning in bridging the gap between human and machine learning: A scientometric analysis
    Abulaish, Muhammad
    Wasi, Nesar Ahmad
    Sharma, Shachi
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 14 (02)
  • [8] Virtual Machine Failure Prediction using Log Analysis
    Nam, Sukhyun
    Hong, Jibum
    Yoo, Jae-Hyoung
    Hong, James Won-Ki
    2021 22ND ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2021, : 279 - 284
  • [9] ROBERT: Bridging the Gap Between Machine Learning and Chemistry
    Dalmau, David
    Alegre-Requena, Juan V.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2024, 14 (05)
  • [10] Online Log Data Analysis with Efficient Machine Learning: A Review
    Skopik, Florian
    Landauer, Max
    Wurzenberger, Markus
    IEEE Security and Privacy, 2022, 20 (03): : 80 - 90