Classify Alzheimer genes association using Naive Bayes algorithm

被引:0
|
作者
Raj, Sushrutha [1 ]
Vishnoi, Anchal [2 ]
Srivastava, Alok [2 ,3 ]
机构
[1] Amity Univ Haryana, Amity Inst Integrat Sci & Hlth, Amity Educ Valley, Gurgaon 122413, India
[2] Sri Innovat & Res Fdn, Ghaziabad 201009, India
[3] L V Prasad Eye Inst, Hyderabad 500034, Telangana, India
来源
HUMAN GENE | 2024年 / 41卷
关键词
Disease gene associations; Alzheimer's candidate genes; Machine learning; Text mining; Text classification; Cross validation; TEXT-MINING SYSTEM; HUMAN-DISEASES; IDENTIFICATION; LINKAGE; DRUGS; PRIORITIZATION; GENOMICS; DATABASE; TARGETS;
D O I
10.1016/j.humgen.2024.201309
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Alzheimer's disease, the most common form of dementia, accounts for 60-80% of cases and its prevalence is projected to increase as aging populations grow. By 2050, the number of individuals with Alzheimer's and dementia worldwide is expected to reach 152 million. Genetics plays a significant role, contributing to about 70% of the overall risk, underscoring the importance of understanding the genetic basis for developing targeted interventions. This study presents a system that combines text mining and machine learning techniques to identify and prioritize prospective candidate genes for Alzheimer's and further classifies them into three association classes with weights. Methods: The machine learning-based classifier was trained over a meticulously curated gold standard dataset and then rigorously validated utilizing a 10-fold cross-validation method, demonstrating its consistency across all the folds of the data. This developed ensemble learning system categorizes PubMed abstracts into three distinct groups: Yes, No, and Ambiguous using text mining and a Bayesian classification algorithm. The system further predicts disease-gene associations over unknown disease-specific prediction data by using the developed classifier. Results: With an average accuracy of 87.33% and confidence level of 90.10% +/- 0.142, the protocol effectively extracted 2031 associated genes, of which 1162, 489 and 1439 belong to positive, negative and ambiguous classes respectively at the threshold of 0.9. In comparison between the established disease gene databases, our system identified 915 positive genes that had not been previously reported. One can use these positive genes for in-depth understanding and ambiguous genes for further exploration of their association with Alzheimer's disease. Conclusions: The system's ability to generate accurate predictions demonstrates its robustness and provides valuable insights into the genetic factors of Alzheimer's disease. Consequently, this study contributes to existing knowledge and paves the way for future research in this field.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Using Naive Bayes Algorithm to Estimate the Response to Drug in Lung Cancer Patients
    Guo, Baoling
    Zheng, Qiuxiang
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (10) : 734 - 748
  • [42] Naive Bayes Algorithm for Lung Cancer Diagnosis Using Image Processing Techniques
    Adi, Kusworo
    Widodo, Catur Edi
    Widodo, Aris Puji
    Gernowo, Rahmat
    Pamungkas, Adi
    Syifa, Rizky Ayomi
    ADVANCED SCIENCE LETTERS, 2017, 23 (03) : 2296 - 2298
  • [43] Downtime Data Classification Using Naive Bayes Algorithm on 2008 ESEC Engine
    Kirana, Mira Chandra
    Fani, Maidel
    Kartikasari, Tri Shella
    Nashrullah, Muhammad
    2020 3RD INTERNATIONAL CONFERENCE ON APPLIED ENGINEERING (ICAE), 2020,
  • [44] Opinion Mining using Naive Bayes
    Hasan, K. M. Azharul
    Sabuj, Mir Shahriar
    Afrin, Zakia
    2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 511 - 514
  • [45] Lyrics Classification using Naive Bayes
    Buzic, Dalibor
    Dobsa, Jasminka
    2018 41ST INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2018, : 1011 - 1015
  • [46] Classification Algorithm for Naive Bayes Based on Validity and Correlation
    Dong, Huailin
    Zhu, Xiaodan
    Wu, Qingfeng
    Huang, Juanjuan
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 1609 - 1612
  • [47] Water quality prediction based on Naive Bayes algorithm
    Ilic, M.
    Srdjevic, Z.
    Srdjevic, B.
    WATER SCIENCE AND TECHNOLOGY, 2022, 85 (04) : 1027 - 1039
  • [48] An improved FloatBoost algorithm for Naive Bayes text classification
    Liu, XM
    Yin, JW
    Dong, JX
    Ghafoor, MA
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 162 - 171
  • [49] Study of an improved Naive Bayes algorithm in data mining
    Qi, Weimin
    Cai, Weiyou
    Ji, Qiaoling
    Li, Tianzhi
    Chen, Guangda
    PROCEEDINGS OF THE 24TH CHINESE CONTROL CONFERENCE, VOLS 1 AND 2, 2005, : 1305 - 1307
  • [50] Sentiment Analysis using Naive Bayes and Complement Naive Bayes Classifier Algorithms on Hadoop Framework
    Seref, Berna
    Bostanci, Erkan
    2018 2ND INTERNATIONAL SYMPOSIUM ON MULTIDISCIPLINARY STUDIES AND INNOVATIVE TECHNOLOGIES (ISMSIT), 2018, : 555 - 561