ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein-DNA Interaction Hotspots

被引:2
|
作者
Tao, Lianci [1 ]
Zhou, Tong [1 ]
Wu, Zhixiang [1 ]
Hu, Fangrui [1 ]
Yang, Shuang [1 ]
Kong, Xiaotian [1 ]
Li, Chunhua [1 ]
机构
[1] Beijing Univ Technol, Coll Chem & Life Sci, Beijing 100124, Peoples R China
基金
中国国家自然科学基金;
关键词
SECONDARY STRUCTURE; FEATURE-SELECTION; INFORMATION; RESIDUES; SMOTE; SETS;
D O I
10.1021/acs.jcim.3c02011
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Protein-DNA interactions are pivotal to various cellular processes. Precise identification of the hotspot residues for protein-DNA interactions holds great significance for revealing the intricate mechanisms in protein-DNA recognition and for providing essential guidance for protein engineering. Aiming at protein-DNA interaction hotspots, this work introduces an effective prediction method, ESPDHot based on a stacked ensemble machine learning framework. Here, the interface residue whose mutation leads to a binding free energy change (Delta Delta G) exceeding 2 kcal/mol is defined as a hotspot. To tackle the imbalanced data set issue, the adaptive synthetic sampling (ADASYN), an oversampling technique, is adopted to synthetically generate new minority samples, thereby rectifying data imbalance. As for molecular characteristics, besides traditional features, we introduce three new characteristic types including residue interface preference proposed by us, residue fluctuation dynamics characteristics, and coevolutionary features. Combining the Boruta method with our previously developed Random Grouping strategy, we obtained an optimal set of features. Finally, a stacking classifier is constructed to output prediction results, which integrates three classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial Neural Network (ANN) as the first layer, and Logistic Regression (LR) algorithm as the second one. Notably, ESPDHot outperforms the current state-of-the-art predictors, achieving superior performance on the independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516, and 0.870, respectively.
引用
收藏
页码:3548 / 3557
页数:10
相关论文
共 50 条
  • [31] Ensemble Machine Learning-Based Approach for Predicting of FRP-Concrete Interfacial Bonding
    Kim, Bubryur
    Lee, Dong-Eun
    Hu, Gang
    Natarajan, Yuvaraj
    Preethaa, Sri
    Rathinakumar, Arun Pandian
    MATHEMATICS, 2022, 10 (02)
  • [32] Current Status of Machine Learning-Based Methods for Identifying Protein-Protein Interaction Sites
    Wang, Bing
    Sun, Wenlong
    Zhang, Jun
    Chen, Peng
    CURRENT BIOINFORMATICS, 2013, 8 (02) : 177 - 182
  • [33] PDA-Pred: Predicting the binding affinity of protein-DNA complexes using machine learning techniques and structural features
    Harini, K.
    Kihara, Daisuke
    Gromiha, M. Michael
    METHODS, 2023, 213 : 10 - 17
  • [34] Machine learning-based approach to GPS antijamming
    Wang, Cheng-Zhen
    Kong, Ling-Wei
    Jiang, Junjie
    Lai, Ying-Cheng
    GPS SOLUTIONS, 2021, 25 (03)
  • [35] A Machine Learning-based Approach for Groundwater Mapping
    Zzaman, Rashed Uz
    Nowreen, Sara
    Khan, Irtesam Mahmud
    Islam, Md Rajibul
    Ibtehaz, Nabil
    Rahman, M. Saifur
    Zahid, Anwar
    Farzana, Dilruba
    Sharmin, Afroza
    Rahman, M. Sohel
    NATURAL RESOURCES RESEARCH, 2022, 31 (01) : 281 - 299
  • [36] A Machine Learning-based Approach for Groundwater Mapping
    Rashed Uz Zzaman
    Sara Nowreen
    Irtesam Mahmud Khan
    Md. Rajibul Islam
    Nabil Ibtehaz
    M. Saifur Rahman
    Anwar Zahid
    Dilruba Farzana
    Afroza Sharmin
    M. Sohel Rahman
    Natural Resources Research, 2022, 31 : 281 - 299
  • [37] Machine learning-based approach to GPS antijamming
    Cheng-Zhen Wang
    Ling-Wei Kong
    Junjie Jiang
    Ying-Cheng Lai
    GPS Solutions, 2021, 25
  • [38] Predicting Variation of DNA Shape Preferences in Protein-DNA Interaction in Cancer Cells with a New Biophysical Model
    Batmanov, Kirill
    Wang, Junbai
    GENES, 2017, 8 (09):
  • [39] A feature-based approach to modeling protein-DNA interactions
    Sharon, Eilon
    Segal, Eran
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2007, 4453 : 77 - +
  • [40] A Feature-Based Approach to Modeling Protein-DNA Interactions
    Sharon, Eilon
    Lubliner, Shai
    Segal, Eran
    PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (08)