Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties

被引:9
|
作者
Guha, Rajarshi [1 ]
Velegol, Darrell [2 ]
机构
[1] Intel Corp, 2501 NE Century Blvd, Hillsboro, OR 97124 USA
[2] Penn State Univ, Dept Chem Engn, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
SMILES; Shannon entropy; SEF; Deep neural networks; MLP; GNN; kNN; Machine learning; CHEMICAL-STRUCTURES; LINE NOTATION; SMILES;
D O I
10.1186/s13321-023-00712-0
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Accurate prediction of molecular properties is essential in the screening and development of drug molecules and other functional materials. Traditionally, property-specific molecular descriptors are used in machine learning models. This in turn requires the identification and development of target or problem-specific descriptors. Additionally, an increase in the prediction accuracy of the model is not always feasible from the standpoint of targeted descriptor usage. We explored the accuracy and generalizability issues using a framework of Shannon entropies, based on SMILES, SMARTS and/or InChiKey strings of respective molecules. Using various public databases of molecules, we showed that the accuracy of the prediction of machine learning models could be significantly enhanced simply by using Shannon entropy-based descriptors evaluated directly from SMILES. Analogous to partial pressures and total pressure of gases in a mixture, we used atom-wise fractional Shannon entropy in combination with total Shannon entropy from respective tokens of the string representation to model the molecule efficiently. The proposed descriptor was competitive in performance with standard descriptors such as Morgan fingerprints and SHED in regression models. Additionally, we found that either a hybrid descriptor set containing the Shannon entropy-based descriptors or an optimized, ensemble architecture of multilayer perceptrons and graph neural networks using the Shannon entropies was synergistic to improve the prediction accuracy. This simple approach of coupling the Shannon entropy framework to other standard descriptors and/or using it in ensemble models could find applications in boosting the performance of molecular property predictions in chemistry and material science.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Application of SNODAS and hydrologic models to enhance entropy-based snow monitoring network design
    Keum, Jongho
    Coulibaly, Paulin
    Razavi, Tara
    Tapsoba, Dominique
    Gobena, Adam
    Weber, Frank
    Pietroniro, Alain
    JOURNAL OF HYDROLOGY, 2018, 561 : 688 - 701
  • [32] Integrated QSAR Models for Prediction of Serotonergic Activity: Machine Learning Unveiling Activity and Selectivity Patterns of Molecular Descriptors
    Lapinska, Natalia
    Paclawski, Adam
    Szlek, Jakub
    Mendyk, Aleksander
    PHARMACEUTICS, 2024, 16 (03)
  • [33] Learning Entropy: On Shannon vs. Machine-Learning-Based Information in Time Series
    Bukovsky, Ivo
    Budik, Ondrej
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022 WORKSHOPS, 2022, 1633 : 402 - 415
  • [34] Machine Learning Models for Predicting Monoclonal Antibody Biophysical Properties from Molecular Dynamics Simulations and Deep Learning-Based Surface Descriptors
    Wu, I-En
    Kalejaye, Lateefat
    Lai, Pin-Kuang
    MOLECULAR PHARMACEUTICS, 2024, 22 (01) : 142 - 153
  • [35] DDoS attack detection in SDN: Enhancing entropy-based detection with machine learning
    Santos-Neto, Marcos J.
    Bordim, Jacir L.
    Alchieri, Eduardo A. P.
    Ishikawa, Edison
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (11):
  • [36] Integrating Machine Learning Models with Comprehensive Data Strategies and Optimization Techniques to Enhance Flood Prediction Accuracy: A Review
    Akinsoji, Adisa Hammed
    Adelodun, Bashir
    Adeyi, Qudus
    Salau, Rahmon Abiodun
    Odey, Golden
    Choi, Kyung Sook
    WATER RESOURCES MANAGEMENT, 2024, 38 (12) : 4735 - 4761
  • [37] Prediction of acetylcholinesterase inhibitors and characterization of correlative molecular descriptors by machine learning methods
    Lv, Wei
    Xue, Ying
    EUROPEAN JOURNAL OF MEDICINAL CHEMISTRY, 2010, 45 (03) : 1167 - 1172
  • [38] In Silico Prediction and Screening of γ-Secretase Inhibitors by Molecular Descriptors and Machine Learning Methods
    Yang, Xue-Gang
    Lv, Wei
    Chen, Yu-Zong
    Xue, Ying
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2010, 31 (06) : 1249 - 1258
  • [39] A prediction model for electrical strength of gaseous medium based on molecular reactivity descriptors and machine learning method
    Luo, Lingyun
    Yang, Shuai
    Yang, Zhao
    Xia, Hanyi
    Xiao, Jixiong
    Wang, Hang
    JOURNAL OF MOLECULAR MODELING, 2025, 31 (02)
  • [40] Properties of an entropy-based signal receiver with an application to ultrasonic molecular imaging
    Hughes, M.S.
    McCarthy, J.E.
    Marsh, J.N.
    Arbeit, J.M.
    Neumann, R.G.
    Fuhrhop, R.W.
    Wallace, K.D.
    Znidersic, D.R.
    Maurizi, B.N.
    Baldwin, S.L.
    Lanza, G.M.
    Wickline, S.A.
    Journal of the Acoustical Society of America, 2007, 121 (06): : 3542 - 3557