Anonymization Framework for Securing Protected Health Information in a Complex Dataset of Medical Narratives

被引:1
|
作者
Hina, Saman [1 ]
Asif, Raheela [2 ]
Ali, Syed Abbas [3 ]
机构
[1] NED Univ Engn & Technol, Dept Comp Sci & Informat Technol, Karachi, Pakistan
[2] NED Univ Engn & Technol, Dept Software Engn, Karachi, Pakistan
[3] NED Univ Engn & Technol, Dept Comp & Informat Syst Engn, Karachi, Pakistan
关键词
Security; Anonymization; Medical Narratives; Classification; Protected Health Information; De-Identification; OF-THE-ART; SYSTEM;
D O I
10.22581/muet1982.2003.16
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
It is imperative in a medical domain that protection of information does not allow an individual to be overlooked. In medical domain, research community encourages use of real-time datasets for research purposes. These real-time datasets contain structured and unstructured (natural language free text) information that can be useful to researchers in various disciplines including computational linguistics. On the other hand, these real-time datasets cannot be distributed without anonymization of Protected Health Information (PHI). The information of PHI (such as Name, age, address, etc.) that can identify an individual is unethical. Therefore, we present a rule-based Natural Language Processing (NLP) anonymization system using a challenging corpus containing medical narratives and ICD-10 codes (medical codes). This anonymization module can be used for pre-processing the corpus containing identifiable information. The corpus used in this research contains '2534' PHIs in '1984' medical records in total. 15% of the labelled corpus was used for improvement of guidelines in the identification and classification of PHI groups and 85% was held for the evaluation. Our anonymization system follows two step process: (1) Identification and cataloging PHIs with four PHI categories ('Patients Name', 'Doctors Name', 'Other Name [Names other than patients and doctors]', 'Place Name'), (2) Anonymization of PHIs by replacing identified PHIs with their respective PHI categories. Our method uses basic language processing, dictionaries, rules and heuristics to identify, classify and anonymize PHIs with PHI categories. We use standard metrics for evaluation and our system outperforms against human annotated gold standard with 100% of F-measure by increasing 39% from baseline results, which proves the reliability of data usage for research.
引用
收藏
页码:612 / 624
页数:13
相关论文
共 25 条
  • [1] Automatic detection of protected health information from clinic narratives
    Yang, Hui
    Garibaldi, Jonathan M.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : S30 - S38
  • [2] Keeping data secure: Protected health information and medical equipment
    Swim, Richard
    Biomedical Instrumentation and Technology, 2012, 46 (04): : 278 - 280
  • [3] Globus MEDICUS: protected health information in medical imaging grids
    Erberich, S. G.
    Silverstein, J. C.
    Chervenak, A.
    Schuler, R.
    Nelson, M. D.
    Kesselman, C.
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2007, 2 : S297 - S299
  • [4] Securing interoperability between chip card based medical information systems and health networks
    Blobel, B
    Pharow, P
    Spiegel, V
    Engel, K
    Engelbrecht, R
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2001, 64 (2-3) : 401 - 415
  • [5] Deep Learning Framework for Advanced De-Identification of Protected Health Information
    Aloqaily, Ahmad
    Abdallah, Emad E.
    Al-Zyoud, Rahaf
    Abu Elsoud, Esraa
    Al-Hassan, Malak
    Abdallah, Alaa E.
    FUTURE INTERNET, 2025, 17 (01)
  • [6] Linking temporal medical records using non-protected health information data
    Bonomi, Luca
    Jiang, Xiaoqian
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (11) : 3304 - 3324
  • [7] Different Perception and Attitude toward Medical Data that including Protected Health Information in Clinical Research
    Jung, Mi Rho
    Soo, Kwang Jang
    Choi, In Young
    2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA 2013), 2013,
  • [8] Identifying protected health information by transformers-based deep learning approach in Chinese medical text
    Xu, Kun
    Song, Yang
    Ma, Jingdong
    HEALTH INFORMATICS JOURNAL, 2025, 31 (01)
  • [9] The stories about racism and health: the development of a framework for racism narratives in medical literature using a computational grounded theory approach
    Figueroa, Caroline A.
    Manalo-Pedro, Erin
    Pola, Swetha
    Darwish, Sajia
    Sachdeva, Pratik
    Guerrero, Christian
    von Vacano, Claudia
    Jha, Maithili
    De Maio, Fernando
    Kennedy, Chris J.
    INTERNATIONAL JOURNAL FOR EQUITY IN HEALTH, 2023, 22 (01)
  • [10] The stories about racism and health: the development of a framework for racism narratives in medical literature using a computational grounded theory approach
    Caroline A. Figueroa
    Erin Manalo-Pedro
    Swetha Pola
    Sajia Darwish
    Pratik Sachdeva
    Christian Guerrero
    Claudia von Vacano
    Maithili Jha
    Fernando De Maio
    Chris J. Kennedy
    International Journal for Equity in Health, 22