ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein-DNA Interaction Hotspots

被引:2
|
作者
Tao, Lianci [1 ]
Zhou, Tong [1 ]
Wu, Zhixiang [1 ]
Hu, Fangrui [1 ]
Yang, Shuang [1 ]
Kong, Xiaotian [1 ]
Li, Chunhua [1 ]
机构
[1] Beijing Univ Technol, Coll Chem & Life Sci, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
SECONDARY STRUCTURE; FEATURE-SELECTION; INFORMATION; RESIDUES; SMOTE; SETS;
D O I
10.1021/acs.jcim.3c02011
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Protein-DNA interactions are pivotal to various cellular processes. Precise identification of the hotspot residues for protein-DNA interactions holds great significance for revealing the intricate mechanisms in protein-DNA recognition and for providing essential guidance for protein engineering. Aiming at protein-DNA interaction hotspots, this work introduces an effective prediction method, ESPDHot based on a stacked ensemble machine learning framework. Here, the interface residue whose mutation leads to a binding free energy change (Delta Delta G) exceeding 2 kcal/mol is defined as a hotspot. To tackle the imbalanced data set issue, the adaptive synthetic sampling (ADASYN), an oversampling technique, is adopted to synthetically generate new minority samples, thereby rectifying data imbalance. As for molecular characteristics, besides traditional features, we introduce three new characteristic types including residue interface preference proposed by us, residue fluctuation dynamics characteristics, and coevolutionary features. Combining the Boruta method with our previously developed Random Grouping strategy, we obtained an optimal set of features. Finally, a stacking classifier is constructed to output prediction results, which integrates three classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial Neural Network (ANN) as the first layer, and Logistic Regression (LR) algorithm as the second one. Notably, ESPDHot outperforms the current state-of-the-art predictors, achieving superior performance on the independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516, and 0.870, respectively.
引用
收藏
页码:3548 / 3557
页数:10
相关论文
共 50 条
  • [1] An effective machine learning-based model for the prediction of protein–protein interaction sites in health systems
    Muhammad Tahir
    Fazlullah Khan
    Maqsood Hayat
    Mohammad Dahman Alshehri
    Neural Computing and Applications, 2024, 36 : 65 - 75
  • [2] Predicting mergers & acquisitions: A machine learning-based approach
    Zhao, Yuchen
    Bi, Xiaogang
    Ma, Qing-Ping
    INTERNATIONAL REVIEW OF FINANCIAL ANALYSIS, 2025, 99
  • [3] Plant-DTI: Extending the landscape of TF protein and DNA interaction in plants by a machine learning-based approach
    Ruengsrichaiya, Bhukrit
    Nukoolkit, Chakarida
    Kalapanulak, Saowalak
    Saithong, Treenut
    FRONTIERS IN PLANT SCIENCE, 2022, 13
  • [4] An effective machine learning-based model for the prediction of protein-protein interaction sites in health systems
    Tahir, Muhammad
    Khan, Fazlullah
    Hayat, Maqsood
    Alshehri, Mohammad Dahman
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (01): : 65 - 75
  • [5] Machine learning-based approach for predicting low birth weight
    Ranjbar, Amene
    Montazeri, Farideh
    Farashah, Mohammadsadegh Vahidi
    Mehrnoush, Vahid
    Darsareh, Fatemeh
    Roozbeh, Nasibeh
    BMC PREGNANCY AND CHILDBIRTH, 2023, 23 (01)
  • [6] Machine learning-based approach for predicting low birth weight
    Amene Ranjbar
    Farideh Montazeri
    Mohammadsadegh Vahidi Farashah
    Vahid Mehrnoush
    Fatemeh Darsareh
    Nasibeh Roozbeh
    BMC Pregnancy and Childbirth, 23
  • [7] A Biophysical Approach to Predicting Protein-DNA Binding Energetics
    Locke, George
    Morozov, Alexandre V.
    GENETICS, 2015, 200 (04) : 1349 - +
  • [8] Machine learning-based approach for predicting the consolidation characteristics of soft soil
    Singh, Moirangthem Johnson
    Kaushik, Anshul
    Patnaik, Gyanesh
    Xu, Dong-Sheng
    Feng, Wei-Qiang
    Rajput, Abhishek
    Prakash, Guru
    Borana, Lalit
    MARINE GEORESOURCES & GEOTECHNOLOGY, 2024, 42 (04) : 405 - 419
  • [9] A machine learning-based approach to predicting the malignant and metastasis of thyroid cancer
    Gu, Jianhua
    Xie, Rongli
    Zhao, Yanna
    Zhao, Zhifeng
    Xu, Dan
    Ding, Min
    Lin, Tingyu
    Xu, Wenjuan
    Nie, Zihuai
    Miao, Enjun
    Tan, Dan
    Zhu, Sibo
    Shen, Dongjie
    Fei, Jian
    FRONTIERS IN ONCOLOGY, 2022, 12
  • [10] THPep: A machine learning-based approach for predicting tumor homing peptides
    Shoombuatong, Watshara
    Schaduangrat, Nalini
    Pratiwi, Reny
    Nantasenamat, Chanin
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2019, 80 : 441 - 451