Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree

被引:6
|
作者
Helmy, Marwa [1 ]
Eldaydamony, Eman [1 ]
Mekky, Nagham [1 ]
Elmogy, Mohammed [1 ]
Soliman, Hassan [1 ]
机构
[1] Mansoura Univ, Fac Comp & Informat, Informat Technol Dept, Mansoura 35516, Egypt
来源
SCIENTIFIC REPORTS | 2022年 / 12卷 / 01期
关键词
FEATURE-SELECTION; IDENTIFICATION; CLASSIFIER; EXPRESSION; DATABASE;
D O I
10.1038/s41598-022-14127-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying genes related to Parkinson's disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree
    Marwa Helmy
    Eman Eldaydamony
    Nagham Mekky
    Mohammed Elmogy
    Hassan Soliman
    Scientific Reports, 12 (1)
  • [2] Predicting Parkinson's disease using gradient boosting decision tree models with electroencephalography signals
    Lee, Seung-Bo
    Kim, Yong-Jeong
    Hwang, Sungeun
    Son, Hyoshin
    Lee, Sang Kun
    Park, Kyung-Il
    Kim, Young-Gon
    PARKINSONISM & RELATED DISORDERS, 2022, 95 : 77 - 85
  • [3] Gradient Boosted Decision Tree based Classification for Recognizing Human Behavior
    Priyadarshini, R. K.
    Banu, Bazila A.
    Nagamani, T.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [4] Efficient Gradient Boosted Decision Tree Training on GPUs
    Wen, Zeyi
    He, Bingsheng
    Ramamohanarao, Kotagiri
    Lu, Shengliang
    Shi, Jiashuai
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 234 - 243
  • [5] A gradient boosted decision tree-based sentiment classification of twitter data
    Neelakandan, S.
    Paulraj, D.
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (04)
  • [6] Gradient Boosted Decision Tree to Model Ustekinumab Trough Levels in Crohn's Disease
    Saleh, Adam A.
    Miroballi, Natalia
    Stading, Rachel
    Glassner, Kerri
    Abraham, Bincy
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2022, 117 (10): : S601 - S602
  • [7] Gradient Boosting Decision Tree-Based Method for Predicting Interactions Between Target Genes and Drugs
    Xuan, Ping
    Sun, Chang
    Zhang, Tiangang
    Ye, Yilin
    Shen, Tonghui
    Dong, Yihua
    FRONTIERS IN GENETICS, 2019, 10
  • [8] Gradient Boosted Decision Tree Algorithms for Medicare Fraud Detection
    Hancock J.T.
    Khoshgoftaar T.M.
    SN Computer Science, 2021, 2 (4)
  • [9] Comparison of Decision Tree Classification Methods and Gradient Boosted Trees
    Dikananda, Arif Rinaldi
    Jumini, Sri
    Tarihoran, Nafan
    Christinawati, Santy
    Trimastuti, Wahyu
    Rahim, Robbi
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2022, 11 (01): : 316 - 322
  • [10] An Extension of Gradient Boosted Decision Tree incorporating Statistical Tests
    Sakata, Ryuji
    Ohama, Iku
    Taniguchi, Tadahiro
    2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 964 - 969