Classify Alzheimer genes association using Naive Bayes algorithm

被引：0

作者：

Raj, Sushrutha ^{[1
]}

Vishnoi, Anchal ^{[2
]}

Srivastava, Alok ^{[2
,3
]}

机构：

[1] Amity Univ Haryana, Amity Inst Integrat Sci & Hlth, Amity Educ Valley, Gurgaon 122413, India

[2] Sri Innovat & Res Fdn, Ghaziabad 201009, India

[3] L V Prasad Eye Inst, Hyderabad 500034, Telangana, India

来源：

HUMAN GENE | 2024年 / 41卷

关键词：

Disease gene associations; Alzheimer's candidate genes; Machine learning; Text mining; Text classification; Cross validation; TEXT-MINING SYSTEM; HUMAN-DISEASES; IDENTIFICATION; LINKAGE; DRUGS; PRIORITIZATION; GENOMICS; DATABASE; TARGETS;

D O I：

10.1016/j.humgen.2024.201309

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

Background: Alzheimer's disease, the most common form of dementia, accounts for 60-80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights. Methods: The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier. Results: With an average accuracy of 87.33% and confidence level of 90.10% +/- 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease. Conclusions: The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.

引用

页数：12

共 50 条

[31] Hierarchical Naive Bayes for genetic association studies
Alberto Malovini
Nicola Barbarini
Riccardo Bellazzi
Francesca De Michelis
BMC Bioinformatics, 13
[32] Sentiment Analysis on Twitter Data-set using Naive Bayes Algorithm
Parveen, Huma
Pandey, Shikha
PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 416 - 419
[33] Software Defect Prediction Using Principal Component Analysis and Naive Bayes Algorithm
Dhamayanthi, N.
Lavanya, B.
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA ENGINEERING (ICCIDE 2018), 2019, 28 : 241 - 248
[34] PredICT: A Mobile Application for Predicting the Students' Career using Naive Bayes Algorithm
Acerado, Risty M.
Marco, Roselia C.
Santos, John Richard
Carpio, Janina Jasmin
Isanan, Hannah Aubrey
PROCEEDINGS OF THE 2019 2ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND INFORMATION MANAGEMENT (ICSIM 2019) / 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (ICBDSC 2019), 2019, : 119 - 123
[35] Email Spam Classification using Neighbor Probability based Naive Bayes Algorithm
Anitha, P. U.
Rao, C. V. Guru
Babu, Suresh
2017 7TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2017, : 350 - 355
[36] Heart Disease Prediction System Using Decision Tree and Naive Bayes Algorithm
Maheswari, Subburaj
Pitchai, Ramu
CURRENT MEDICAL IMAGING, 2019, 15 (08) : 712 - 717
[37] Classification of Toddler Nutrition Status with Anthropometry Calculation using Naive Bayes Algorithm
Putri, Riris Aulya
Sendari, Siti
Widiyaningtyas, Triyanna
PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET 2018), 2018, : 66 - 70
[38] Addressing Challenges for Intrusion Detection System using Naive Bayes and PCA Algorithm
Almansob, Saqr Mohammed
Lomte, Santosh Shivajirao
2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 565 - 568
[39] Text-based Language Identifier using Multinomial Naive Bayes Algorithm
Rawat, Sunita
Werulkar, Lakshita
Jaywant, Sagarika
INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2023, 14 (01): : 96 - 102
[40] Analysis and Classification of Danger Level in Android Applications using Naive Bayes Algorithm
Utama, Ridho Alif
Sukarno, Parman
Jadied, Erwid Musthofa
2018 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2018, : 281 - 285

← 1 2 3 4 5 →