Classify Alzheimer genes association using Naive Bayes algorithm

被引:0
|
作者
Raj, Sushrutha [1 ]
Vishnoi, Anchal [2 ]
Srivastava, Alok [2 ,3 ]
机构
[1] Amity Univ Haryana, Amity Inst Integrat Sci & Hlth, Amity Educ Valley, Gurgaon 122413, India
[2] Sri Innovat & Res Fdn, Ghaziabad 201009, India
[3] L V Prasad Eye Inst, Hyderabad 500034, Telangana, India
来源
HUMAN GENE | 2024年 / 41卷
关键词
Disease gene associations; Alzheimer's candidate genes; Machine learning; Text mining; Text classification; Cross validation; TEXT-MINING SYSTEM; HUMAN-DISEASES; IDENTIFICATION; LINKAGE; DRUGS; PRIORITIZATION; GENOMICS; DATABASE; TARGETS;
D O I
10.1016/j.humgen.2024.201309
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Alzheimer's disease, the most common form of dementia, accounts for 60-80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights. Methods: The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier. Results: With an average accuracy of 87.33% and confidence level of 90.10% +/- 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease. Conclusions: The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Hierarchical Naive Bayes for genetic association studies
    Alberto Malovini
    Nicola Barbarini
    Riccardo Bellazzi
    Francesca De Michelis
    BMC Bioinformatics, 13
  • [32] Sentiment Analysis on Twitter Data-set using Naive Bayes Algorithm
    Parveen, Huma
    Pandey, Shikha
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 416 - 419
  • [33] Software Defect Prediction Using Principal Component Analysis and Naive Bayes Algorithm
    Dhamayanthi, N.
    Lavanya, B.
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA ENGINEERING (ICCIDE 2018), 2019, 28 : 241 - 248
  • [34] PredICT: A Mobile Application for Predicting the Students' Career using Naive Bayes Algorithm
    Acerado, Risty M.
    Marco, Roselia C.
    Santos, John Richard
    Carpio, Janina Jasmin
    Isanan, Hannah Aubrey
    PROCEEDINGS OF THE 2019 2ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION MANAGEMENT (ICSIM 2019) / 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (ICBDSC 2019), 2019, : 119 - 123
  • [35] Email Spam Classification using Neighbor Probability based Naive Bayes Algorithm
    Anitha, P. U.
    Rao, C. V. Guru
    Babu, Suresh
    2017 7TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2017, : 350 - 355
  • [36] Heart Disease Prediction System Using Decision Tree and Naive Bayes Algorithm
    Maheswari, Subburaj
    Pitchai, Ramu
    CURRENT MEDICAL IMAGING, 2019, 15 (08) : 712 - 717
  • [37] Classification of Toddler Nutrition Status with Anthropometry Calculation using Naive Bayes Algorithm
    Putri, Riris Aulya
    Sendari, Siti
    Widiyaningtyas, Triyanna
    PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET 2018), 2018, : 66 - 70
  • [38] Addressing Challenges for Intrusion Detection System using Naive Bayes and PCA Algorithm
    Almansob, Saqr Mohammed
    Lomte, Santosh Shivajirao
    2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 565 - 568
  • [39] Text-based Language Identifier using Multinomial Naive Bayes Algorithm
    Rawat, Sunita
    Werulkar, Lakshita
    Jaywant, Sagarika
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 96 - 102
  • [40] Analysis and Classification of Danger Level in Android Applications using Naive Bayes Algorithm
    Utama, Ridho Alif
    Sukarno, Parman
    Jadied, Erwid Musthofa
    2018 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2018, : 281 - 285