Predicting Solute Descriptors for Organic Chemicals by a Deep Neural Network (DNN) Using Basic Chemical Structures and a Surrogate Metric

被引:19
|
作者
Zhang, Kai [1 ]
Zhang, Huichun [1 ]
机构
[1] Case Western Reserve Univ, Dept Civil & Environm Engn, Cleveland, OH 44106 USA
基金
美国国家科学基金会;
关键词
chemical similarity; chemical transfer modeling; evaluation metric; model interpretation; octanol-water partition coefficient; PaDEL; pp-LFER descriptors; RDKit; FREE-ENERGY RELATIONSHIPS; PARTITION-COEFFICIENTS; BIOCONCENTRATION FACTOR; QSAR MODELS; BIOACCUMULATION; AIR;
D O I
10.1021/acs.est.1c05398
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Solute descriptors have been widely used to model chemical transfer processes through poly-parameter linear free energy relationships (pp-LFERs); however, there are still substantial difficulties in obtaining these descriptors accurately and quickly for new organic chemicals. In this research, models (PaDEL-DNN) that require only SMILES of chemicals were built to satisfactorily estimate pp-LFER descriptors using deep neural networks (DNN) and the PaDEL chemical representation. The PaDEL-DNN-estimated pp-LFER descriptors demonstrated good performance in modeling storage-lipid/water partitioning coefficient (log Kstorage-lipid/water), bioconcentration factor (BCF), aqueous solubility (ESOL), and hydration free energy (freesolve). Then, assuming that the accuracy in the estimated values of widely available properties, e.g., logP (octanol-water partition coefficient), can calibrate estimates for less available but related properties, we proposed logP as a surrogate metric for evaluating the overall accuracy of the estimated pp-LFER descriptors. When using the pp-LFER descriptors to model log Kstorage-lipid/water, BCF, ESOL, and freesolve, we achieved around 0.1 log unit lower errors for chemicals whose estimated pp-LFER descriptors were deemed "accurate" by the surrogate metric. The interpretation of the PaDEL-DNN models revealed that, for a given test chemical, having several (around 5) "similar" chemicals in the training data set was crucial for accurate estimation while the remaining less similar training chemicals provided reasonable baseline estimates. Lastly, pp-LFER descriptors for over 2800 persistent, bioaccumulative, and toxic chemicals were reasonably estimated by combining PaDEL-DNN with the surrogate metric. Overall, the PaDEL-DNN/surrogate metric and newly estimated descriptors will greatly benefit chemical transfer modeling.
引用
收藏
页码:2054 / 2064
页数:11
相关论文
共 5 条