A scalable and real-time system for disease prediction using big data processing

被引:0
|
作者
Abderrahmane Ed-daoudy
Khalil Maalmi
Aziza El Ouaazizi
机构
[1] National School of Applied Sciences (ENSA),Artificial Intelligence, Data Sciences and Emerging Systems Laboratory (LIASSE)
[2] Sidi Mohamed Ben Abdellah University,undefined
来源
关键词
Real-time; Streaming processing; Machine learning; MLlib; Apache Spark; Tweet processing;
D O I
暂无
中图分类号
学科分类号
摘要
The growing chronic diseases patients and the centralization of medical resources cause significant economic impact resulting in hospital visits, hospital readmission, and other healthcare costs. This paper proposes a scalable and real-time system for disease prediction from medical data streams. This is carried out by integrating Twitter, Apache Kafka, Apache Spark and Apache Cassandra. Thus, Twitter users tweet attributes related to health, Kafka streaming receives all desired tweets attributes and ingest them to Spark streaming. Here, a machine learning algorithm is applied to predict health status and send back a response message through Kafka. The heart disease dataset, obtained from the UCI repository, was used for experiments. In order to enhance prediction accuracy, Relief algorithm is used for features selection. We compared sex types of relevant machine learning algorithms implemented by Spark MLlib such as Random Forest (RF), Naive Bayes, Support Vector Machine, Multilayer Perceptron, Decision Tree and Logistic Regression with the full features as well as selected features. The highest classification accuracy of 92.05% was reported using RF with selected features. The scalability of RF using Spark MLlib and WEKA framework for both training and application stages was measured. The results show significantly better performances of Spark in terms of scalability and computing times.
引用
收藏
页码:30405 / 30434
页数:29
相关论文
共 50 条
  • [1] A scalable and real-time system for disease prediction using big data processing
    Ed-daoudy, Abderrahmane
    Maalmi, Khalil
    El Ouaazizi, Aziza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 30405 - 30434
  • [2] Real-time prediction of accident using Big data system
    Tantaoui, Mouad
    Laanaoui, My Driss
    Kabil, Mustapha
    3RD INTERNATIONAL CONFERENCE ON NETWORKING, INFORMATION SYSTEM & SECURITY (NISS'20), 2020,
  • [3] Real-time private consumption prediction using big data
    Shin, Seung Jun
    Seo, Beomseok
    KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (01) : 13 - 38
  • [4] Real-time stream processing for Big Data
    Wingerath, Wolfram
    Gessert, Felix
    Friedrich, Steffen
    Ritter, Norbert
    IT-INFORMATION TECHNOLOGY, 2016, 58 (04): : 186 - 194
  • [5] Real-time processing of streaming big data
    Safaei, Ali A.
    REAL-TIME SYSTEMS, 2017, 53 (01) : 1 - 44
  • [6] Real-time processing of streaming big data
    Ali A. Safaei
    Real-Time Systems, 2017, 53 : 1 - 44
  • [7] Scalable Containerized Pipeline for Real-time Big Data Analytics
    Aurangzaib, Rana
    Iqbal, Waheed
    Abdullah, Muhammad
    Bukhari, Faisal
    Ullah, Faheem
    Erradi, Abdelkarim
    2022 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2022), 2022, : 25 - 32
  • [8] Big Data Real-time Processing Based on Storm
    Yang, Wenjie
    Liu, Xingang
    Zhang, Lan
    Yang, Laurence T.
    2013 12TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2013), 2013, : 1784 - 1787
  • [9] Survey of Real-time Processing Systems for Big Data
    Liu, Xiufeng
    Iftikhar, Nadeem
    Xie, Xike
    PROCEEDINGS OF THE 18TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM (IDEAS14), 2014, : 356 - 361
  • [10] Processing of real-time data in big manufacturing systems
    Benesch, Manfred
    Kubin, Hellmuth
    Kabitzsch, Klaus
    27TH INTERNATIONAL CONFERENCE ON FLEXIBLE AUTOMATION AND INTELLIGENT MANUFACTURING, FAIM2017, 2017, 11 : 2114 - 2122