Unsupervised anomaly detection of implausible electronic health records: a real-world evaluation in cancer registries

被引:4
|
作者
Roechner, Philipp [1 ]
Rothlauf, Franz [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Informat Syst & Business Adm, Jakob Welder Weg 9, D-55128 Mainz, Germany
关键词
Anomaly detection; Outlier detection; Data quality; Quality control; Electronic health records; Medical records; Cancer registration; Neural network; Machine learning; Artificial intelligence;
D O I
10.1186/s12874-023-01946-0
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. Methods Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a patternbased approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a realworld scenario by medical domain experts. Results Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified 8% of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, 28% of the proposed 300 records in each sample were implausible. This corresponds to a precision of 28% for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was 22% and the sensitivity of FindFPOF was 26%. Both anomaly detection methods had a specificity of 94%. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. Conclusions Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] REAL-WORLD OUTCOMES OF PATIENTS WITH ADVANCED ENDOMETRIAL CANCER: A RETROSPECTIVE COHORT STUDY OF US ELECTRONIC HEALTH RECORDS
    Banerjee, S.
    Smith, G.
    Lima, J.
    Long, G.
    Alam, N.
    Nakamura, H.
    Meulendijks, D.
    Monk, B. J.
    INTERNATIONAL JOURNAL OF GYNECOLOGICAL CANCER, 2021, 31 : A84 - A85
  • [22] Secondary Use of Electronic Health Records for Building Large, Real-World ILD Cohorts
    Farrand, E. D.
    Gologorskaya, O.
    Mills, H.
    Radhakrishnan, L.
    Collard, H. R.
    Butte, A.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2021, 203 (09)
  • [23] Real-World Anomaly Detection Using Deep Learning
    Koppikar, Unnati
    Sujatha, C.
    Patil, Prakashgoud
    Mudenagudi, Uma
    INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 333 - 342
  • [24] Multivariate Anomaly Detection in Real-World Industrial Systems
    Hu, Xiao
    Subbu, Raj
    Bonissone, Piero
    Qiu, Hai
    Yer, Naresh
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 2766 - 2771
  • [25] Methodology for Using Real-World Data From Electronic Health Records to Assess Chemotherapy Administration in Women With Breast Cancer
    Bhimani, Jenna
    O'Connell, Kelli
    Ergas, Isaac J.
    Foley, Marilyn
    Gallagher, Grace B.
    Griggs, Jennifer J.
    Heon, Narre
    Kolevska, Tatjana
    Kotsurovskyy, Yuriy
    Kroenke, Candyce H.
    Laurent, Cecile A.
    Liu, Raymond
    Nakata, Kanichi G.
    Persaud, Sonia
    Rivera, Donna R.
    Roh, Janise M.
    Tabatabai, Sara
    Valice, Emily
    Bowles, Erin J. A.
    Bandera, Elisa V.
    Kushi, Lawrence H.
    Kantor, Elizabeth D.
    JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [26] A methodology for using real-world data from electronic health records to assess chemotherapy administration in women with breast cancer
    Bhimani, Jenna
    O'Connell, Kelli
    Burganowski, Rachael P.
    Ergas, Isaac J.
    Foley, Marilyn J.
    Gallagher, Grace B.
    Griggs, Jennifer J.
    Heon, Narre
    Kolevska, Tatjana
    Kotsurovskyy, Yuriy
    Kroenke, Candyce H.
    Nakata, Kanichi G.
    Persaud, Sonia
    Rivera, Donna R.
    Roh, Janise M.
    Tabatabai, Sara
    Valice, Emily
    Bowles, Erin J.
    Bandera, Elisa V.
    Kushi, Lawrence H.
    Kantor, Elizabeth D.
    CANCER RESEARCH, 2023, 83 (05)
  • [27] REAL-WORLD TREATMENT PATTERNS AMONG PATIENTS WITH OVARIAN CANCER: AN ANALYSIS OF A LARGE US ELECTRONIC HEALTH RECORDS DATABASE
    Karve, S.
    Walker, G.
    Wang, R.
    Lawrence, D.
    Horsfield, A.
    VALUE IN HEALTH, 2016, 19 (07) : A754 - A754
  • [28] Real-World Evidence of Indapamide-Induced Rhabdomyolysis: A Retrospective Analysis of Electronic Health Records
    Alroba, Raseel
    Alfakhri, Almaha
    Badreldin, Hisham
    Alrwisan, Adel
    Almadani, Ohoud
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2024, 33 (11)
  • [29] Approach to machine learning for extraction of real-world data variables from electronic health records
    Adamson, Blythe
    Waskom, Michael
    Blarre, Auriane
    Kelly, Jonathan
    Krismer, Konstantin
    Nemeth, Sheila
    Gippetti, James
    Ritten, John
    Harrison, Katherine
    Ho, George
    Linzmayer, Robin
    Bansal, Tarun
    Wilkinson, Samuel
    Amster, Guy
    Estola, Evan
    Benedum, Corey M.
    Fidyk, Erin
    Estevez, Melissa
    Shapiro, Will
    Cohen, Aaron B.
    FRONTIERS IN PHARMACOLOGY, 2023, 14
  • [30] Synthetic Data as a Proxy for Real-World Electronic Health Records in the Patient Length of Stay Prediction
    Bietsch, Dominik
    Stahlbock, Robert
    Voss, Stefan
    SUSTAINABILITY, 2023, 15 (18)