Predicting Parkinson disease related genes based on PyFeat and gradient boosted decision tree

被引:6
|
作者
Helmy, Marwa [1 ]
Eldaydamony, Eman [1 ]
Mekky, Nagham [1 ]
Elmogy, Mohammed [1 ]
Soliman, Hassan [1 ]
机构
[1] Mansoura Univ, Fac Comp & Informat, Informat Technol Dept, Mansoura 35516, Egypt
来源
SCIENTIFIC REPORTS | 2022年 / 12卷 / 01期
关键词
FEATURE-SELECTION; IDENTIFICATION; CLASSIFIER; EXPRESSION; DATABASE;
D O I
10.1038/s41598-022-14127-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying genes related to Parkinson's disease (PD) is an active research topic in biomedical analysis, which plays a critical role in diagnosis and treatment. Recently, many studies have proposed different techniques for predicting disease-related genes. However, a few of these techniques are designed or developed for PD gene prediction. Most of these PD techniques are developed to identify only protein genes and discard long noncoding (lncRNA) genes, which play an essential role in biological processes and the transformation and development of diseases. This paper proposes a novel prediction system to identify protein and lncRNA genes related to PD that can aid in an early diagnosis. First, we preprocessed the genes into DNA FASTA sequences from the University of California Santa Cruz (UCSC) genome browser and removed the redundancies. Second, we extracted some significant features of DNA FASTA sequences using the PyFeat method with the AdaBoost as feature selection. These selected features achieved promising results compared with extracted features from some state-of-the-art feature extraction techniques. Finally, the features were fed to the gradient-boosted decision tree (GBDT) to diagnose different tested cases. Seven performance metrics were used to evaluate the performance of the proposed system. The proposed system achieved an average accuracy of 78.6%, the area under the curve equals 84.5%, the area under precision-recall (AUPR) equals 85.3%, F1-score equals 78.3%, Matthews correlation coefficient (MCC) equals 0.575, sensitivity (SEN) equals 77.1%, and specificity (SPC) equals 80.2%. The experiments demonstrate promising results compared with other systems. The predicted top-rank protein and lncRNA genes are verified based on a literature review.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction
    Thong Nguyen
    Wu, Xiaobao
    Dong, Xinshuai
    Anh Tuan Luu
    Cong-Duy Nguyen
    Hai, Zhen
    Bing, Lidong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1670 - 1696
  • [32] Financial distress prediction using a corrected feature selection measure and gradient boosted decision tree
    Qian, Hongyi
    Wang, Baohui
    Yuan, Minghe
    Gao, Songfeng
    Song, You
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 190
  • [33] Credit scoring based on a Bagging-cascading boosted decision tree
    Zou, Yao
    Gao, Changchun
    Xia, Meng
    Pang, Congyuan
    INTELLIGENT DATA ANALYSIS, 2022, 26 (06) : 1557 - 1578
  • [34] Predicting the risk of pipe failure using gradient boosted decision trees and weighted risk analysis
    Neal Andrew Barton
    Stephen Henry Hallett
    Simon Richard Jude
    Trung Hieu Tran
    npj Clean Water, 5
  • [35] Gait Analysis Based Approach for Parkinson's Disease Modeling with Decision Tree Classifiers
    Krajuskina, Anna
    Nomm, Sven
    Toomela, Aaro
    Medijainen, Kadri
    Tamm, Eveli
    Vaske, Martti
    Uvarov, Dan
    Kahar, Hedi
    Nugis, Marita
    Taba, Pille
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 3720 - 3725
  • [36] Predicting the risk of pipe failure using gradient boosted decision trees and weighted risk analysis
    Barton, Neal Andrew
    Hallett, Stephen Henry
    Jude, Simon Richard
    Tran, Trung Hieu
    NPJ CLEAN WATER, 2022, 5 (01)
  • [37] Estrogen related genes and Parkinson's disease
    Chung, S. J.
    Armasu, S. M.
    Biernacka, J. M.
    Lesnick, T. G.
    Rider, D. N.
    Cunningham, J. M.
    Rocca, W. A.
    Maraganore, D. M.
    PARKINSONISM & RELATED DISORDERS, 2009, 15 : S155 - S155
  • [38] Method Based on Floating Car Data and Gradient-Boosted Decision Tree Classification for the Detection of Auxiliary Through Lanes at Intersections
    Li, Xiaolong
    Wu, Yuzhen
    Tan, Yongbin
    Cheng, Penggen
    Wu, Jing
    Wang, Yuqian
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (08):
  • [39] Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder
    Peng, Jiajie
    Guan, Jiaojiao
    Shang, Xuequn
    FRONTIERS IN GENETICS, 2019, 10
  • [40] Feature Expansion with Word2Vec for Topic Classification with Gradient Boosted Decision Tree on Twitter
    Maulidia, Dhuhita Trias
    Setiawan, Erwin Budi
    2022 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ITS APPLICATIONS (ICODSA), 2022, : 87 - 92