ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein-DNA Interaction Hotspots

被引:2
|
作者
Tao, Lianci [1 ]
Zhou, Tong [1 ]
Wu, Zhixiang [1 ]
Hu, Fangrui [1 ]
Yang, Shuang [1 ]
Kong, Xiaotian [1 ]
Li, Chunhua [1 ]
机构
[1] Beijing Univ Technol, Coll Chem & Life Sci, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
SECONDARY STRUCTURE; FEATURE-SELECTION; INFORMATION; RESIDUES; SMOTE; SETS;
D O I
10.1021/acs.jcim.3c02011
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Protein-DNA interactions are pivotal to various cellular processes. Precise identification of the hotspot residues for protein-DNA interactions holds great significance for revealing the intricate mechanisms in protein-DNA recognition and for providing essential guidance for protein engineering. Aiming at protein-DNA interaction hotspots, this work introduces an effective prediction method, ESPDHot based on a stacked ensemble machine learning framework. Here, the interface residue whose mutation leads to a binding free energy change (Delta Delta G) exceeding 2 kcal/mol is defined as a hotspot. To tackle the imbalanced data set issue, the adaptive synthetic sampling (ADASYN), an oversampling technique, is adopted to synthetically generate new minority samples, thereby rectifying data imbalance. As for molecular characteristics, besides traditional features, we introduce three new characteristic types including residue interface preference proposed by us, residue fluctuation dynamics characteristics, and coevolutionary features. Combining the Boruta method with our previously developed Random Grouping strategy, we obtained an optimal set of features. Finally, a stacking classifier is constructed to output prediction results, which integrates three classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial Neural Network (ANN) as the first layer, and Logistic Regression (LR) algorithm as the second one. Notably, ESPDHot outperforms the current state-of-the-art predictors, achieving superior performance on the independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516, and 0.870, respectively.
引用
收藏
页码:3548 / 3557
页数:10
相关论文
共 50 条
  • [41] Protein-DNA interface hotspots prediction based on fusion features of embeddings of protein language model and handcrafted features
    Li, Xiang
    Wang, Gang-Ao
    Wei, Zhuoyu
    Wang, Hong
    Zhu, Xiaolei
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2023, 107
  • [42] A Machine Learning-Based Approach for Predicting Structural Settlement on Layered Liquefiable Soils Improved with Densification
    Hwang, Yu-Wei
    Dashti, Shideh
    GEO-CONGRESS 2023: GEOTECHNICS OF NATURAL HAZARDS, 2023, 338 : 297 - 307
  • [43] A microstructure sensitive machine learning-based approach for predicting fatigue life of additively manufactured parts
    Kishore, Prateek
    Mondal, Aratrick
    Trivedi, Aayush
    Singh, Punit
    Alankar, Alankar
    INTERNATIONAL JOURNAL OF FATIGUE, 2025, 192
  • [44] A Machine Learning-Based Approach for Predicting Surgeons' Subjective Experience and Skill Levels: Neuroimaging Study
    Keles, H. O.
    Cengiz, C.
    Demiral, I.
    Ozmen, M. M.
    Omurtag, A.
    BRITISH JOURNAL OF SURGERY, 2021, 108
  • [45] Predicting Flexural Capacity of Ultrahigh-Performance Concrete Beams: Machine Learning-Based Approach
    Solhmirzaei, Roya
    Salehi, Hadi
    Kodur, Venkatesh
    JOURNAL OF STRUCTURAL ENGINEERING, 2022, 148 (05)
  • [46] Machine Learning-Based Method for Predicting Compressive Strength of Concrete
    Li, Daihong
    Tang, Zhili
    Kang, Qian
    Zhang, Xiaoyu
    Li, Youhua
    PROCESSES, 2023, 11 (02)
  • [47] A Machine Learning-Based Approach for Predicting the Execution Time of CFD Applications on Cloud Computing Environment
    Duong Ngoc Hieu
    Thai Tieu Minh
    Trinh Van Quang
    Bui Xuan Giang
    Tran Van Hoai
    FUTURE DATA AND SECURITY ENGINEERING, FDSE 2016, 2016, 10018 : 40 - 52
  • [48] A light gradient boosting machine learning-based approach for predicting clinical data breast cancer
    Wang Qiuqian
    Gao Min
    Zhang KeZhu
    Chen Chen
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2025, 8 (01)
  • [49] A Machine Learning-Based Approach for Predicting Installation Torque of Helical Piles from SPT Data
    Peres, Marcelo Saraiva
    Schiavon, Jose Antonio
    Ribeiro, Dimas Betioli
    BUILDINGS, 2024, 14 (05)
  • [50] Machine Learning-based Models for Predicting the Penetration Depth of Concrete
    Li M.
    Wu H.
    Dong H.
    Ren G.
    Zhang P.
    Huang F.
    Binggong Xuebao/Acta Armamentarii, 2023, 44 (12): : 3771 - 3782