Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree

被引:6
|
作者
Helmy, Marwa [1 ]
Eldaydamony, Eman [1 ]
Mekky, Nagham [1 ]
Elmogy, Mohammed [1 ]
Soliman, Hassan [1 ]
机构
[1] Mansoura Univ, Fac Comp & Informat, Informat Technol Dept, Mansoura 35516, Egypt
来源
SCIENTIFIC REPORTS | 2022年 / 12卷 / 01期
关键词
FEATURE-SELECTION; IDENTIFICATION; CLASSIFIER; EXPRESSION; DATABASE;
D O I
10.1038/s41598-022-14127-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying genes related to Parkinson's disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Gradient-Boosted Decision Tree with used Slime Mould Algorithm (SMA) for wastewater treatment systems
    Chauhan, Jyoti
    Rani, R. M.
    Prashanthi, Vempaty
    Almujibah, Hamad
    Alshahri, Abdullah
    Rao, Koppula Srinivas
    Radhakrishnan, Arun
    WATER REUSE, 2023, 13 (03) : 393 - 410
  • [42] Engineering a Novel Recruitment System Using Gradient Boosted Decision Tree Algorithm and Business Intelligence Principles
    Parvez, Sheik Javed
    Arulprakasam, Abarna
    Ahamed, I. Thowbik
    Carolyn, J. Jane
    JOURNAL OF PHARMACEUTICAL NEGATIVE RESULTS, 2022, 13 : 448 - 459
  • [43] A DIAGNOSTIC MODEL FOR PARKINSON'S DISEASE BASED ON ANOIKIS-RELATED GENES
    Bao, Y.
    Huang, D.
    PARKINSONISM & RELATED DISORDERS, 2024, 122
  • [44] A Diagnostic Model for Parkinson’s Disease Based on Anoikis-Related Genes
    Yiwen Bao
    Lufeng Wang
    Hong Liu
    Jie Yang
    Fei Yu
    Can Cui
    Dongya Huang
    Molecular Neurobiology, 2024, 61 : 3641 - 3656
  • [45] A Diagnostic Model for Parkinson's Disease Based on Anoikis-Related Genes
    Bao, Yiwen
    Wang, Lufeng
    Liu, Hong
    Yang, Jie
    Yu, Fei
    Cui, Can
    Huang, Dongya
    MOLECULAR NEUROBIOLOGY, 2024, 61 (06) : 3641 - 3656
  • [46] Predicting Parkinson's Disease Related Genes Using Frequent Gene Co-expression Analysis
    Zhang, Jie
    Ni, Shiwei
    Parvin, Jeffrey
    Yang, Yufeng
    Huang, Kun
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 1042 - 1044
  • [47] Classification of Parkinson's Disease by Decision Tree Based Instance Selection and Ensemble Learning Algorithms
    Li, Yongming
    Yang, Liuyang
    Wang, Pin
    Zhang, Cheng
    Xiao, Jie
    Zhang, Yanling
    Qiu, Mingguo
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2017, 7 (02) : 444 - 452
  • [48] Research on Predicting Line Loss Rate in Low Voltage Distribution Network Based on Gradient Boosting Decision Tree
    Yao, Mengting
    Zhu, Yun
    Li, Junjie
    Wei, Hua
    He, Penghui
    ENERGIES, 2019, 12 (13)
  • [49] The ForEx++ based decision tree ensemble approach for robust detection of Parkinson’s disease
    Moumita Pramanik
    Ratika Pradhan
    Parvati Nandy
    Akash Kumar Bhoi
    Paolo Barsocchi
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 : 11429 - 11453
  • [50] Using improved gradient-boosted decision tree algorithm based on Kalman filter (GBDT-KF) in time series prediction
    Li, Ling
    Dai, Sida
    Cao, Zhiwei
    Hong, Jinghui
    Jiang, Shu
    Yang, Kunmeng
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (09): : 6887 - 6900