Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

被引:4
|
作者
Roechner, Philipp [1 ]
Rothlauf, Franz [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Informat Syst & Business Adm, Jakob Welder Weg 9, D-55128 Mainz, Germany
关键词
Anomaly detection; Outlier detection; Data quality; Quality control; Electronic health records; Medical records; Cancer registration; Neural network; Machine learning; Artificial intelligence;
D O I
10.1186/s12874-023-01946-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. Methods Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a patternbased approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a realworld scenario by medical domain experts. Results Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8% of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28% of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28% for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22% and the sensitivity of FindFPOF was 26%. Both anomaly detection methods had a specificity of 94%. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. Conclusions Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Real-World Outcomes of Glaucoma Filtration Surgery Using Electronic Health Records: An Informatics Study
    Sun, Michelle T.
    Singh, Kuldev
    Wang, Sophia Y.
    JOURNAL OF GLAUCOMA, 2022, 31 (11) : 847 - 853
  • [32] Predicting Wilson disease progression using machine learning with real-world electronic health records
    Liang, Caihua
    Kelly, Scott
    Shen, Rongjun
    Li, Ling
    Lobello, Kasia
    Arkin, Steven
    Huang, Kui
    Zhou, Xiaofeng
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 : 63 - 64
  • [33] Real-World Evidence of COVID-19 Patients' Data Quality in the Electronic Health Records
    Binkheder, Samar
    Asiri, Mohammed Ahmed
    Altowayan, Khaled Waleed
    Alshehri, Turki Mohammed
    Alzarie, Mashhour Faleh
    Aldekhyyel, Raniah N.
    Almaghlouth, Ibrahim A.
    Almulhem, Jwaher A.
    HEALTHCARE, 2021, 9 (12)
  • [34] LEVERAGING ELECTRONIC HEALTH RECORDS TO MEET THE REAL-WORLD EVIDENCE NEEDS OF HTA: A UK PERSPECTIVE
    Leahy, T.
    Ramagopalan, S.
    Sammon, C.
    VALUE IN HEALTH, 2019, 22 : S726 - S727
  • [35] Privacy-preserving Real-world Video Anomaly Detection
    Noghre, Ghazal Alinezhad
    2023 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING, SMARTCOMP, 2023, : 253 - 254
  • [36] REAL-WORLD VIDEO ANOMALY DETECTION BY EXTRACTING SALIENT FEATURES
    Watanabe, Yudai
    Okabe, Makoto
    Harada, Yasunori
    Kashima, Naoji
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 891 - 895
  • [37] REAL-WORLD PROBLEMS WITH REAL-WORLD DATA: ADDRESSING DATA QUALITY IN THE ELECTRONIC HEALTH RECORD
    Anderson, Wesley
    Boyce, Danielle
    Kurtycz, Ruth
    Roddy, Will
    Heavner, Smith
    CRITICAL CARE MEDICINE, 2024, 52
  • [38] Use of Real-World Electronic Health Records to Estimate Risk, Risk Factors, and Disparities for COVID-19 in Patients With Cancer
    Desai, Aakash
    Khaki, Ali Raza
    Kuderer, Nicole M.
    JAMA ONCOLOGY, 2021, 7 (02) : 227 - 229
  • [39] Real world challenges in maintaining data integrity in electronic health records in a cancer program
    Khela, Harpriya
    Khalil, Justin
    Daxon, Nathan
    Neilson, Zdenka
    Shahrokhi, Tina
    Chung, Peter
    Wong, Philip
    TECHNICAL INNOVATIONS & PATIENT SUPPORT IN RADIATION ONCOLOGY, 2024, 29
  • [40] Multimodal Anomaly Detection for Autonomous Cyber-Physical Systems Empowering Real-World Evaluation
    Noorani, Mahshid
    Puthanveettil, Tharun, V
    Zoulkarni, Asim
    Mirenzi, Jack
    Grody, Charles D.
    Baras, John S.
    DECISION AND GAME THEORY FOR SECURITY, GAMESEC 2024, 2025, 14908 : 306 - 325