ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein-DNA Interaction Hotspots

被引:2
|
作者
Tao, Lianci [1 ]
Zhou, Tong [1 ]
Wu, Zhixiang [1 ]
Hu, Fangrui [1 ]
Yang, Shuang [1 ]
Kong, Xiaotian [1 ]
Li, Chunhua [1 ]
机构
[1] Beijing Univ Technol, Coll Chem & Life Sci, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
SECONDARY STRUCTURE; FEATURE-SELECTION; INFORMATION; RESIDUES; SMOTE; SETS;
D O I
10.1021/acs.jcim.3c02011
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Protein-DNA interactions are pivotal to various cellular processes. Precise identification of the hotspot residues for protein-DNA interactions holds great significance for revealing the intricate mechanisms in protein-DNA recognition and for providing essential guidance for protein engineering. Aiming at protein-DNA interaction hotspots, this work introduces an effective prediction method, ESPDHot based on a stacked ensemble machine learning framework. Here, the interface residue whose mutation leads to a binding free energy change (Delta Delta G) exceeding 2 kcal/mol is defined as a hotspot. To tackle the imbalanced data set issue, the adaptive synthetic sampling (ADASYN), an oversampling technique, is adopted to synthetically generate new minority samples, thereby rectifying data imbalance. As for molecular characteristics, besides traditional features, we introduce three new characteristic types including residue interface preference proposed by us, residue fluctuation dynamics characteristics, and coevolutionary features. Combining the Boruta method with our previously developed Random Grouping strategy, we obtained an optimal set of features. Finally, a stacking classifier is constructed to output prediction results, which integrates three classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial Neural Network (ANN) as the first layer, and Logistic Regression (LR) algorithm as the second one. Notably, ESPDHot outperforms the current state-of-the-art predictors, achieving superior performance on the independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516, and 0.870, respectively.
引用
收藏
页码:3548 / 3557
页数:10
相关论文
共 50 条
  • [21] PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
    Wenyi Yang
    Lei Deng
    Scientific Reports, 10
  • [22] A penalized Bayesian approach to predicting sparse protein-DNA binding landscapes
    Levinson, Matthew
    Zhou, Qing
    BIOINFORMATICS, 2014, 30 (05) : 636 - 643
  • [23] PreDBA: A heterogeneous ensemble approach for predicting protein-DNA binding affinity
    Yang, Wenyi
    Deng, Lei
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [24] Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning
    Fang, Zirui
    Li, Zixuan
    Li, Ming
    Yue, Zhenyu
    Li, Ke
    GENES, 2024, 15 (06)
  • [25] On the interpretability of machine learning-based model for predicting hypertension
    Elshawi, Radwa
    Al-Mallah, Mouaz H.
    Sakr, Sherif
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (1)
  • [26] On the interpretability of machine learning-based model for predicting hypertension
    Radwa Elshawi
    Mouaz H. Al-Mallah
    Sherif Sakr
    BMC Medical Informatics and Decision Making, 19
  • [27] Machine Learning-Based Approach for Predicting Diabetes Employing Socio-Demographic Characteristics
    Rahman, Md. Ashikur
    Abdulrazak, Lway Faisal
    Ali, Md. Mamun
    Mahmud, Imran
    Ahmed, Kawsar
    Bui, Francis M.
    ALGORITHMS, 2023, 16 (11)
  • [28] A machine learning-based approach for predicting the level of palm oil adulteration in coconut oil
    Dassanayake, Supuni. P.
    Nawarathna, Lakshika S.
    JOURNAL OF FOOD COMPOSITION AND ANALYSIS, 2025, 137
  • [29] Predicting biodegradation products and pathways: a hybrid knowledge- and machine learning-based approach
    Wicker, Joerg
    Fenner, Kathrin
    Ellis, Lynda
    Wackett, Larry
    Kramer, Stefan
    BIOINFORMATICS, 2010, 26 (06) : 814 - 821
  • [30] Predicting the Cochlear Dead Regions Using a Machine Learning-Based Approach with Oversampling Techniques
    Chang, Young-Soo
    Park, Hee-Sung
    Moon, Il-Joon
    MEDICINA-LITHUANIA, 2021, 57 (11):