A system for de-identifying medical message board text

被引:0
|
作者
Adrian Benton
Shawndra Hill
Lyle Ungar
Annie Chung
Charles Leonard
Cristin Freeman
John H Holmes
机构
[1] University of Pennsylvania School of Medicine,
[2] University of Pennsylvania,undefined
[3] The Wharton School,undefined
[4] University of Pennsylvania School of Engineering and Applied Science,undefined
来源
关键词
Word List; Name Entity Recognition; Entity Recognition; Message Board; Doxy;
D O I
暂无
中图分类号
学科分类号
摘要
There are millions of public posts to medical message boards by users seeking support and information on a wide range of medical conditions. It has been shown that these posts can be used to gain a greater understanding of patients’ experiences and concerns. As investigators continue to explore large corpora of medical discussion board data for research purposes, protecting the privacy of the members of these online communities becomes an important challenge that needs to be met. Extant entity recognition methods used for more structured text are not sufficient because message posts present additional challenges: the posts contain many typographical errors, larger variety of possible names, terms and abbreviations specific to Internet posts or a particular message board, and mentions of the authors’ personal lives. The main contribution of this paper is a system to de-identify the authors of message board posts automatically, taking into account the aforementioned challenges. We demonstrate our system on two different message board corpora, one on breast cancer and another on arthritis. We show that our approach significantly outperforms other publicly available named entity recognition and de-identification systems, which have been tuned for more structured text like operative reports, pathology reports, discharge summaries, or newswire.
引用
收藏
相关论文
共 50 条
  • [21] Secure Method for De-Identifying and Anonymizing Large Panel Datasets
    Ajina, Mohanad
    Yousefi, Bahram
    Jones, Jim
    Laskey, Kathryn
    2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [22] Complementary Methods for De-identifying Sensitive Data with a focus on Clinical Discourse
    Kokkinakis, Dimitrios
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (45): : 243 - 246
  • [23] De-identifying Face Image Datasets While Retaining Facial Expressions
    Leibl, Andreas
    Meissner, Andreas
    Altmann, Stefan
    Attenberger, Andreas
    Mayer, Helmut
    2023 IEEE INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS, IJCB, 2023,
  • [24] Named entity recognition for de-identifying Spanish electronic health records
    Moreno-Barea, Francisco J.
    López-García, Guillermo
    Mesa, Héctor
    Ribelles, Nuria
    Alba, Emilio
    Jerez, José M.
    Veredas, Francisco J.
    Computers in Biology and Medicine, 2025, 185
  • [25] De-identifying facial images using singular value decomposition and projections
    Chriskos, P.
    Zoidi, O.
    Tefas, A.
    Pitas, I.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (03) : 3435 - 3468
  • [26] De-identifying transmission system using wireless channel as differential privacy noise and deep neural networks
    Lee, Harim
    Ahn, Hyeongtae
    Park, Young Deok
    ICT EXPRESS, 2023, 9 (04): : 683 - 690
  • [27] De-identifying facial images using singular value decomposition and projections
    P. Chriskos
    O. Zoidi
    A. Tefas
    I. Pitas
    Multimedia Tools and Applications, 2017, 76 : 3435 - 3468
  • [28] The Red Hen Anonymizer and the Red Hen Protocol for de-identifying audiovisual recordings
    Khasbage, Yash
    Alcaraz Carrion, Daniel
    Hinnell, Jennifer
    Robertson, Frankie
    Singla, Karan
    Uhrig, Peter
    Turner, Mark
    LINGUISTICS VANGUARD, 2024, 9 (01): : 229 - 244
  • [29] De-Identifying the Distressed in the Transgender Community Related to Their Identity Formation and Discrimination in India
    Meher, Bandana
    Acharya, Arun Kumar
    GENEALOGY, 2022, 6 (04)
  • [30] Blockchain-based Model for Gene Data Management using De-Identifying Scheme
    Kim, Yejin
    Park, Young-Hoon
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-ASIA (ICCE-ASIA), 2021,