Use of tree-based machine learning methods to screen affinitive peptides based on docking data

被引：1

作者：

Feng, Hua ^{[1
]}

Wang, Fangyu ^{[1
]}

Li, Ning ^{[1
]}

Xu, Qian ^{[1
]}

Zheng, Guanming ^{[1
,2
]}

Sun, Xuefeng ^{[1
]}

Hu, Man ^{[1
]}

Li, Xuewu ^{[1
]}

Xing, Guangxu ^{[1
]}

Zhang, Gaiping ^{[1
,3
,4
,5
]}

机构：

[1] Henan Agr Univ, Coll Food Sci & Technol, Zhengzhou, Peoples R China

[2] Henan Univ Chinese Med, Publ Hlth & Prevent Med Teaching & Res Ctr, Zhengzhou, Henan, Peoples R China

[3] Longhu Modern Immunol Lab, Zhengzhou, Peoples R China

[4] Peking Univ, Sch Adv Agr Sci, Beijing, Peoples R China

[5] Yangzhou Univ, Jiangsu Coinnovat Ctr Prevent & Control Important, Yangzhou, Jiangsu, Peoples R China

来源：

MOLECULAR INFORMATICS | 2023年 / 42卷 / 12期

关键词：

affinity classification; docking data; machine learning; peptides; tree-based algorithms; FEATURE-SELECTION; PROTEIN; BINDING;

D O I：

10.1002/minf.202300143

中图分类号：

R914 [药物化学];

学科分类号：

100701 ;

摘要：

Screening peptides with good affinity is an important step in peptide-drug discovery. Recent advancement in computer and data science have made machine learning a useful tool in accurately affinitive-peptide screening. In current study, four different tree-based algorithms, including Classification and regression trees (CART), C5.0 decision tree (C50), Bagged CART (BAG) and Random Forest (RF), were employed to explore the relationship between experimental peptide affinities and virtual docking data, and the performance of each model was also compared in parallel. All four algorithms showed better performances on dataset pre-scaled, -centered and -PCA than other pre-processed dataset. After model re-built and hyperparameter optimization, the optimal C50 model (C50O) showed the best performances in terms of Accuracy, Kappa, Sensitivity, Specificity, F1, MCC and AUC when validated on test data and an unknown PEDV datasets evaluation (Accuracy=80.4 %). BAG and RFO (the optimal RF), as two best models during training process, did not performed as expecting during in testing and unknown dataset validations. Furthermore, the high correlation of the predictions of RFO and BAG to C50O implied the high stability and robustness of their prediction. Whereas although the good performance on unknown dataset, the poor performance in test data validation and correlation analysis indicated CARTO could not be used for future data prediction. To accurately evaluate the peptide affinity, the current study firstly gave a tree-model competition on affinitive peptide prediction by using virtual docking data, which would expand the application of machine learning algorithms in studying PepPIs and benefit the development of peptide therapeutics. image

引用

页数：11

共 50 条

[41] Assessment of flood susceptibility prediction based on optimized tree-based machine learning models
Eslaminezhad, Seyed Ahmad
Eftekhari, Mobin
Azma, Aliasghar
Kiyanfar, Ramin
Akbari, Mohammad
JOURNAL OF WATER AND CLIMATE CHANGE, 2022, 13 (06) : 2353 - 2385
[42] MACHINE LEARNING TO JUDGE LABOR RELATIONS' HARMONIOUSNESS BASED ON DECISION TREE-BASED METHOD
Chen, Tianxue
Yang, Heqing
3RD INTERNATIONAL SYMPOSIUM ON INFORMATION ENGINEERING AND ELECTRONIC COMMERCE (IEEC 2011), PROCEEDINGS, 2011, : 243 - 246
[43] Land subsidence modelling using tree-based machine learning algorithms
Rahmati, Omid
Falah, Fatemeh
Naghibi, Seyed Amir
Biggs, Trent
Soltani, Milad
Deo, Ravinesh C.
Cerda, Artemi
Mohammadi, Farnoush
Dieu Tien Bui
SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 672 : 239 - 252
[44] Discussion on the tree-based machine learning model in the study of landslide susceptibility
Liu, Qiang
Tang, Aiping
Huang, Ziyuan
Sun, Lixin
Han, Xiaosheng
NATURAL HAZARDS, 2022, 113 (02) : 887 - 911
[45] Discussion on the tree-based machine learning model in the study of landslide susceptibility
Qiang Liu
Aiping Tang
Ziyuan Huang
Lixin Sun
Xiaosheng Han
Natural Hazards, 2022, 113 : 887 - 911
[46] Faster Convergence with Lexicase Selection in Tree-Based Automated Machine Learning
Matsumoto, Nicholas
Saini, Anil Kumar
Ribeiro, Pedro
Choi, Hyunjun
Orlenko, Alena
Lyytikainen, Leo-Pekka
Laurikka, Jari O.
Lehtimaki, Terho
Batista, Sandra
Moore, Jason H.
GENETIC PROGRAMMING, EUROGP 2023, 2023, 13986 : 165 - 181
[47] A tree-based machine learning methodology to automatically classify software vulnerabilities
Aivatoglou, Georgios
Anastasiadis, Mike
Spanos, Georgios
Voulgaridis, Antonis
Votis, Konstantinos
Tzovaras, Dimitrios
PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2021, : 312 - 317
[48] Advanced tree-based machine learning methods for predicting the seismic response of regular and irregular RC frames
Demir, Ahmet
Sahin, Emrehan Kutlug
Demir, Selcuk
STRUCTURES, 2024, 64
[49] Malware Detection Method using Tree-based Machine Learning Algorithms
Okada, Satoshi
Matsuda, Wataru
Fujimoto, Mariko
Mitsunaga, Takuho
2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING (ICOCO), 2021, : 103 - 108
[50] Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood
He, Tuo
Jiao, Lichao
Wiedenhoeft, Alex C.
Yin, Yafang
PLANTA, 2019, 249 (05) : 1617 - 1625

← 1 2 3 4 5 →