A scalable and real-time system for disease prediction using big data processing

被引:0
|
作者
Abderrahmane Ed-daoudy
Khalil Maalmi
Aziza El Ouaazizi
机构
[1] National School of Applied Sciences (ENSA),Artificial Intelligence, Data Sciences and Emerging Systems Laboratory (LIASSE)
[2] Sidi Mohamed Ben Abdellah University,undefined
来源
关键词
Real-time; Streaming processing; Machine learning; MLlib; Apache Spark; Tweet processing;
D O I
暂无
中图分类号
学科分类号
摘要
The growing chronic diseases patients and the centralization of medical resources cause significant economic impact resulting in hospital visits, hospital readmission, and other healthcare costs. This paper proposes a scalable and real-time system for disease prediction from medical data streams. This is carried out by integrating Twitter, Apache Kafka, Apache Spark and Apache Cassandra. Thus, Twitter users tweet attributes related to health, Kafka streaming receives all desired tweets attributes and ingest them to Spark streaming. Here, a machine learning algorithm is applied to predict health status and send back a response message through Kafka. The heart disease dataset, obtained from the UCI repository, was used for experiments. In order to enhance prediction accuracy, Relief algorithm is used for features selection. We compared sex types of relevant machine learning algorithms implemented by Spark MLlib such as Random Forest (RF), Naive Bayes, Support Vector Machine, Multilayer Perceptron, Decision Tree and Logistic Regression with the full features as well as selected features. The highest classification accuracy of 92.05% was reported using RF with selected features. The scalability of RF using Spark MLlib and WEKA framework for both training and application stages was measured. The results show significantly better performances of Spark in terms of scalability and computing times.
引用
收藏
页码:30405 / 30434
页数:29
相关论文
共 50 条
  • [41] Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem
    Rathore, M. Mazhar
    Son, Hojae
    Ahmad, Awais
    Paul, Anand
    Jeon, Gwanggil
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (03) : 630 - 646
  • [42] Parallel Job Processing Technique for Real-time Big-Data Processing Framework
    Son, Jae Gi
    Kang, Ji-Woo
    An, Jae-Hoon
    Ahn, Hyung-Joo
    Chun, Hyo-Jung
    Kim, Jung-Guk
    2016 RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS, 2016, : 226 - 229
  • [43] A distributed real-time recommender system for big data streams
    Hazem, Heidy
    Awad, Ahmed
    Yousef, Ahmed Hassan
    AIN SHAMS ENGINEERING JOURNAL, 2023, 14 (08)
  • [44] Prediction of wafer state after plasma processing using real-time tool data
    Lee, Sherry F., 1600, IEEE, Los Alamitos, CA, United States (08):
  • [45] PREDICTION OF WAFER STATE AFTER PLASMA PROCESSING USING REAL-TIME TOOL DATA
    LEE, SF
    SPANOS, CJ
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 1995, 8 (03) : 252 - 261
  • [46] A Scalable Framework for Sensor Data Ingestion and Real-Time Processing in Cloud Manufacturing
    Pacella, Massimo
    Papa, Antonio
    Papadia, Gabriele
    Fedeli, Emiliano
    ALGORITHMS, 2025, 18 (01)
  • [47] Real-Time Data Processing Techniques for a Scalable Spatial and Temporal Dimension Reduction
    Gavric, Aleksandar
    Vujoscvic, Dusan
    Radosavljevic, Nemanja
    Prvulovic, Petar
    2022 21ST INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA (INFOTEH), 2022,
  • [48] Data processing system for denoising of signals in real-time using the wavelet transform
    Mota, HD
    Vasconcelos, FH
    PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON INTELLIGENT SOLUTIONS IN EMBEDDED SYSTEMS, 2005, : 128 - 138
  • [49] Scalable real-time system design using preemption thresholds
    Saksena, M
    Wang, Y
    21ST IEEE REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 2000, : 25 - 34
  • [50] epiC: an extensible and scalable system for processing Big Data
    Jiang, Dawei
    Wu, Sai
    Chen, Gang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Xu, Jun
    VLDB JOURNAL, 2016, 25 (01): : 3 - 26