Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives

被引:1
|
作者
Hina, Saman [1 ]
Asif, Raheela [2 ]
Ali, Syed Abbas [3 ]
机构
[1] NED Univ Engn & Technol, Dept Comp Sci & Informat Technol, Karachi, Pakistan
[2] NED Univ Engn & Technol, Dept Software Engn, Karachi, Pakistan
[3] NED Univ Engn & Technol, Dept Comp & Informat Syst Engn, Karachi, Pakistan
关键词
Security; Anonymization; Medical Narratives; Classification; Protected Health Information; De-Identification; OF-THE-ART; SYSTEM;
D O I
10.22581/muet1982.2003.16
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
It is imperative in a medical domain that protection of information does not allow an individual to be overlooked. In medical domain, research community encourages use of real-time datasets for research purposes. These real-time datasets contain structured and unstructured (natural language free text) information that can be useful to researchers in various disciplines including computational linguistics. On the other hand, these real-time datasets cannot be distributed without anonymization of Protected Health Information (PHI). The information of PHI (such as Name, age, address, etc.) that can identify an individual is unethical. Therefore, we present a rule-based Natural Language Processing (NLP) anonymization system using a challenging corpus containing medical narratives and ICD-10 codes (medical codes). This anonymization module can be used for pre-processing the corpus containing identifiable information. The corpus used in this research contains '2534' PHIs in '1984' medical records in total. 15% of the labelled corpus was used for improvement of guidelines in the identification and classification of PHI groups and 85% was held for the evaluation. Our anonymization system follows two step process: (1) Identification and cataloging PHIs with four PHI categories ('Patients Name', 'Doctors Name', 'Other Name [Names other than patients and doctors]', 'Place Name'), (2) Anonymization of PHIs by replacing identified PHIs with their respective PHI categories. Our method uses basic language processing, dictionaries, rules and heuristics to identify, classify and anonymize PHIs with PHI categories. We use standard metrics for evaluation and our system outperforms against human annotated gold standard with 100% of F-measure by increasing 39% from baseline results, which proves the reliability of data usage for research.
引用
收藏
页码:612 / 624
页数:13
相关论文
共 25 条
  • [21] Genome-Wide Association Study of Alzheimer's Disease Brain Imaging Biomarkers and Neuropsychological Phenotypes in the European Medical Information Framework for Alzheimer's Disease Multimodal Biomarker Discovery Dataset
    Homann, Jan
    Osburg, Tim
    Ohlei, Olena
    Dobricic, Valerija
    Deecke, Laura
    Bos, Isabelle
    Vandenberghe, Rik
    Gabel, Silvy
    Scheltens, Philip
    Teunissen, Charlotte E.
    Engelborghs, Sebastiaan
    Frisoni, Giovanni
    Blin, Olivier
    Richardson, Jill C.
    Bordet, Regis
    Lleo, Alberto
    Alcolea, Daniel
    Popp, Julius
    Clark, Christopher
    Peyratout, Gwendoline
    Martinez-Lage, Pablo
    Tainta, Mikel
    Dobson, Richard J. B.
    Legido-Quigley, Cristina
    Sleegers, Kristel
    Van Broeckhoven, Christine
    Wittig, Michael
    Franke, Andre
    Lill, Christina M.
    Blennow, Kaj
    Zetterberg, Henrik
    Lovestone, Simon
    Streffer, Johannes
    ten Kate, Mara
    Vos, Stephanie J. B.
    Barkhof, Frederik
    Visser, Pieter Jelle
    Bertram, Lars
    FRONTIERS IN AGING NEUROSCIENCE, 2022, 14
  • [22] SCIENTIFIC AND TECHNICAL-INFORMATION UNITS OF LEADING RESEARCH INSTITUTES IN FRAMEWORK OF "ACADEMY-OF-MEDICAL-SCIENCES-OF-USSR AND MINISTRY-OF-PUBLIC-HEALTH-OF-USSR
    KOSTIENKO, TN
    ANDREEV, VM
    NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 1-ORGANIZATSIYA I METODIKA INFORMATSIONNOI RABOTY, 1976, (02): : 14 - 15
  • [23] An LLM-Based Framework for Zero-Shot De-Identifying Flexible Text Data in Protected Health Information Enabling Potential Risk-Informed Patient Safety
    Chang, C. W.
    Hu, M.
    Ghavidel, B.
    Wynne, J. F.
    Qiu, R. L. J.
    Washington, M.
    Kayode, O.
    Chin, W. G.
    Yang, K.
    Scott, J. G.
    Patel, A. B., Jr.
    Yang, X.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2024, 120 (02): : E518 - E518
  • [24] ZTCloudGuard: Zero Trust Context-Aware Access Management Framework to Avoid Medical Errors in the Era of Generative AI and Cloud-Based Health Information Ecosystems
    Al-hammuri, Khalid
    Gebali, Fayez
    Kanan, Awos
    AI, 2024, 5 (03) : 1111 - 1131
  • [25] Introducing a health information literacy competencies map: connecting the Association of American Medical Colleges Core Entrustable Professional Activities and Accreditation Council for Graduate Medical Education Common Program Requirements to the Association of College & Research Libraries Framework
    Brennan, Emily A.
    Ogawa, Rikke Sarah
    Thormodson, Kelly
    von Isenburg, Megan
    JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2020, 108 (03) : 420 - 427