Classify Alzheimer genes association using Naive Bayes algorithm

被引:0
|
作者
Raj, Sushrutha [1 ]
Vishnoi, Anchal [2 ]
Srivastava, Alok [2 ,3 ]
机构
[1] Amity Univ Haryana, Amity Inst Integrat Sci & Hlth, Amity Educ Valley, Gurgaon 122413, India
[2] Sri Innovat & Res Fdn, Ghaziabad 201009, India
[3] L V Prasad Eye Inst, Hyderabad 500034, Telangana, India
来源
HUMAN GENE | 2024年 / 41卷
关键词
Disease gene associations; Alzheimer's candidate genes; Machine learning; Text mining; Text classification; Cross validation; TEXT-MINING SYSTEM; HUMAN-DISEASES; IDENTIFICATION; LINKAGE; DRUGS; PRIORITIZATION; GENOMICS; DATABASE; TARGETS;
D O I
10.1016/j.humgen.2024.201309
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Alzheimer's disease, the most common form of dementia, accounts for 60-80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights. Methods: The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier. Results: With an average accuracy of 87.33% and confidence level of 90.10% +/- 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease. Conclusions: The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Mining housekeeping genes with a Naive Bayes classifier
    Luna De Ferrari
    Stuart Aitken
    BMC Genomics, 7
  • [22] USING FUZZY-ROUGH SET EVALUATION FOR FEATURE SELECTION AND NAIVE BAYES TO CLASSIFY THE PARKINSON DISEASE
    Lanbaran, Naiyer Mohammadi
    Celik, Ercan
    Kotan, Ozgur
    MISKOLC MATHEMATICAL NOTES, 2022, 23 (02) : 787 - 800
  • [23] SMS Classification Method for Disaster Response using Naive Bayes Algorithm
    Ordonez, Aris J.
    Paje, Rommel Evan J.
    Naz, Rodel N.
    2018 INTERNATIONAL SYMPOSIUM ON COMPUTER, CONSUMER AND CONTROL (IS3C 2018), 2018, : 233 - 236
  • [24] Using Naive Bayes Algorithm to Students' bachelor Academic Performances Analysis
    Razaque, Fahad
    Soomro, Nareena
    Shaikh, Shoaib Ahmed
    Soomro, Safeeullah
    Samo, Javed Ahmed
    Kumar, Natesh
    Dharejo, Huma
    2017 4TH IEEE INTERNATIONAL CONFERENCE ON ENGINEERING TECHNOLOGIES AND APPLIED SCIENCES (ICETAS), 2017,
  • [25] A Voice Activity Detector using SVM and Naive Bayes Classification Algorithm
    Selvakumari, N. A. Sheela
    Radha, V.
    PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSPC'17), 2017, : 1 - 6
  • [26] Using association features to enhance the performance of naive Bayes text classifier
    Zhang, Y
    Zhang, LJ
    Yan, JF
    Li, ZH
    ICCIMA 2003: FIFTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2003, : 336 - 341
  • [27] Hierarchical Naive Bayes for genetic association studies
    Malovini, Alberto
    Barbarini, Nicola
    Bellazzi, Riccardo
    BMC BIOINFORMATICS, 2012, 13
  • [28] Feature selection for optimizing the Naive Bayes algorithm
    Winarti, Titin
    Vydia, Vensy
    ENGINEERING, INFORMATION AND AGRICULTURAL TECHNOLOGY IN THE GLOBAL DIGITAL REVOLUTION, 2020, : 47 - 51
  • [29] Transfer Naive Bayes algorithm with group probabilities
    Li, Jingmei
    Wu, Weifei
    Xue, Di
    APPLIED INTELLIGENCE, 2020, 50 (01) : 61 - 73
  • [30] Transfer Naive Bayes algorithm with group probabilities
    Jingmei Li
    Weifei Wu
    Di Xue
    Applied Intelligence, 2020, 50 : 61 - 73