Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree

被引:6
|
作者
Helmy, Marwa [1 ]
Eldaydamony, Eman [1 ]
Mekky, Nagham [1 ]
Elmogy, Mohammed [1 ]
Soliman, Hassan [1 ]
机构
[1] Mansoura Univ, Fac Comp & Informat, Informat Technol Dept, Mansoura 35516, Egypt
来源
SCIENTIFIC REPORTS | 2022年 / 12卷 / 01期
关键词
FEATURE-SELECTION; IDENTIFICATION; CLASSIFIER; EXPRESSION; DATABASE;
D O I
10.1038/s41598-022-14127-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying genes related to Parkinson's disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] Empirical Measurement of Performance Maintenance of Gradient Boosted Decision Tree Models for Malware Detection
    Galen, Colin
    Steele, Robert
    3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 193 - 198
  • [22] swGBDT: Efficient Gradient Boosted Decision Tree on Sunway Many-Core Processor
    Yin, Bohong
    Li, Yunchun
    Dun, Ming
    You, Xin
    Yang, Hailong
    Luan, Zhongzhi
    Qian, Depei
    SUPERCOMPUTING FRONTIERS (SCFA 2020), 2020, 12082 : 67 - 86
  • [23] Big Data Analytics Framework Using Squirrel Search Optimized Gradient Boosted Decision Tree for Heart Disease Diagnosis
    Shaik, Kareemulla
    Ramesh, Janjhyam Venkata Naga
    Mahdal, Miroslav
    Rahman, Mohammad Zia Ur
    Khasim, Syed
    Kalita, Kanak
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [24] Predicting Soil Available Phosphorus by Hyperspectral Regression Method Based on Gradient Boosting Decision Tree
    Jin Xiu
    Zhu Xianzhi
    Li Shaowen
    Wang Wencai
    Qi Haijun
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (13)
  • [25] Ensemble Gradient Boosted Tree for SoH Estimation Based on Diagnostic Features
    Khaleghi, Sahar
    Firouz, Yousef
    Berecibar, Maitane
    Van Mierlo, Joeri
    Van Den Bossche, Peter
    ENERGIES, 2020, 13 (05)
  • [26] Predicting and interpreting financial distress using a weighted boosted tree-based tree
    Liu, Wanan
    Fan, Hong
    Xia, Min
    Pang, Congyuan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
  • [27] Predicting potential miRNA-disease associations by combining gradient boosting decision tree with logistic regression
    Zhou, Su
    Wang, Shulin
    Wu, Qi
    Azim, Riasat
    Li, Wen
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2020, 85
  • [28] Forecasting the default risk of Chinese listed companies using a gradient-boosted decision tree based on the undersampling technique
    Wang, Shanshan
    Chi, Guotai
    Zhou, Ying
    Chen, Li
    JOURNAL OF RISK MODEL VALIDATION, 2023, 17 (04): : 97 - 121
  • [29] Detection of mild cognitive impairment in Parkinson's disease using gradient boosting decision tree models based on multilevel DTI indices
    Chen, Boyu
    Xu, Ming
    Yu, Hongmei
    He, Jiachuan
    Li, Yingmei
    Song, Dandan
    Fan, Guo Guang
    JOURNAL OF TRANSLATIONAL MEDICINE, 2023, 21 (01)
  • [30] Detection of mild cognitive impairment in Parkinson’s disease using gradient boosting decision tree models based on multilevel DTI indices
    Boyu Chen
    Ming Xu
    Hongmei Yu
    Jiachuan He
    Yingmei Li
    Dandan Song
    Guo Guang Fan
    Journal of Translational Medicine, 21