Enhancing random forest classification with NLP in DAMEH: A system for DAta Management in eHealth Domain

被引:8
|
作者
Amato, Flora [1 ]
Coppolino, Luigi [2 ]
Cozzolino, Giovanni [1 ]
Mazzeo, Giovanni [1 ]
Moscato, Francesco [3 ]
Nardone, Roberto [4 ]
机构
[1] Univ Naples Federico II, DIETI, Naples, Italy
[2] Univ Naples Parthenope, DI, Naples, Italy
[3] Univ Salerno, DIEM, Fisciano, Italy
[4] Univ Mediterranea Reggio Calabria, DIIES, Reggio Di Calabria, Italy
关键词
Big data processing; E-health; Machine learning; Random forests; Multi-classification schema; FEATURE-SELECTION; ARCHITECTURE;
D O I
10.1016/j.neucom.2020.08.091
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of pervasive IoT devices in Smart Cities, have increased the Volume of data produced in many and many field. Interesting and very useful applications grow up in number in E-health domain, where smart devices are used in order to manage huge amount of data, in highly distributed environments, in order to provide smart services able to collect data to fill medical records of patients. The problem here is to gather data, to produce records and to analyze medical records depending on their contents. Since data gathering involve very different devices (not only wearable medical sensors, but also environmental smart devices, like weather, pollution and other sensors) it is very difficult to classify data depending their contents, in order to enable better management of patients. Data from smart devices couple with medical records written in natural language: we describe here an architecture that is able to determine best features for classification, depending on existent medical records. The architecture is based on pre filtering phase based on Natural Language Processing, that is able to enhance Machine learning classification based on Random Forests. We carried on experiments on about 5000 medical records from real (anonymized) case studies from various health-care organizations in Italy. We show accuracy of the presented approach in terms of Accuracy-Rejection curves. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:79 / 91
页数:13
相关论文
共 50 条
  • [31] Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data
    Poona, Nitesh Keshavelal
    van Niekerk, Adriaan
    Nadel, Ryan Leslie
    Ismail, Riyad
    APPLIED SPECTROSCOPY, 2016, 70 (02) : 322 - 333
  • [32] Imbalanced data classification based on DB-SLSMOTE and random forest
    Han, Qi
    Yang, Rui
    Wan, Zitong
    Chen, Shaozhi
    Huang, Mengjie
    Wen, Huiqing
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 6271 - 6276
  • [33] Smart meter data classification using optimized random forest algorithm
    Zakariazadeh, Alireza
    ISA TRANSACTIONS, 2022, 126 : 361 - 369
  • [34] Credit Data Classification Based on Ant Colony Algorithm and Random Forest
    Feng, Ruiqi
    Han, Lu
    Chen, Muzi
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA, ICAIBD 2024, 2024, : 144 - 149
  • [35] A Technique for Spatial Data Classification Using Random Forest based Correlation
    Sheena Smart, P. D.
    Thanammal, K. K.
    Sujatha, S. S.
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2022, 13 (01): : 16 - 27
  • [36] Classification Using Random Forest on Imbalanced Credit Card Transaction Data
    Aktar, Hafija
    Masud, Md Abdul
    Aunto, Nusrat Jahan
    Sakib, Syed Nazmus
    2021 3RD INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR INDUSTRY 4.0 (STI), 2021,
  • [37] UNBALANCED BIG DATA CLASSIFICATION BASED ON IMPROVED RANDOM FOREST ALGORITHM
    Zheng, Xin
    Huang, Li
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2024, 20 (02): : 575 - 590
  • [38] Evaluation and classification of data management system
    Yuan, LP
    THIRD INTERNATIONAL SYMPOSIUM ON DIVERSIFICATION OF VEGETABLE CROPS, 1998, (467): : 139 - 141
  • [39] Nonnegative Matrix Factorization and Random Forest for Classification of Heart Sound Recordings in the Spectral Domain
    Antink, Christoph Hoog
    Becker, Julian
    Leonhardt, Steffen
    Walter, Marian
    2016 COMPUTING IN CARDIOLOGY CONFERENCE (CINC), VOL 43, 2016, 43 : 809 - 812
  • [40] Development of automatic classification system for leukocyte images using random forest
    Tomiyama S.
    Sakata-Yanagimoto M.
    Chiba S.
    Aikawa N.
    IEEJ Transactions on Electronics, Information and Systems, 2018, 138 (04) : 347 - 351