CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引:4
|
作者
Rubel Schneider, Elisa Terumi [1 ]
Gumiel, Yohan Bonescki [2 ]
Andrioli de Souza, Joao Vitor [3 ]
Mukai, Lilian Mie [2 ]
Silva e Oliveira, Lucas Emanuel [3 ]
Rebelo, Marina de Sa [4 ]
Gutierrez, Marco Antonio [4 ]
Krieger, Jose Eduardo [4 ]
Teodoro, Douglas [5 ]
Moro, Claudia [1 ]
Paraiso, Emerson Cabrera [1 ]
机构
[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil
[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil
[3] Comsentimento, Curitiba, Parana, Brazil
[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil
[5] Univ Geneva, Geneva, Switzerland
关键词
natural language processing; transformer; clinical texts; language model;
D O I
10.1109/CBMS58004.2023.00247
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.
引用
收藏
页码:378 / 381
页数:4
相关论文
共 50 条
  • [21] Pre-training and Evaluating Transformer-based Language Models for Icelandic
    Daoason, Jon Friorik
    Loftsson, Hrafn
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
  • [22] Shared functional specialization in transformer-based language models and the human brain
    Kumar, Sreejan
    Sumers, Theodore R.
    Yamakoshi, Takateru
    Goldstein, Ariel
    Hasson, Uri
    Norman, Kenneth A.
    Griffiths, Thomas L.
    Hawkins, Robert D.
    Nastase, Samuel A.
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [23] Localizing in-domain adaptation of transformer-based biomedical language models
    Buonocore, Tommaso Mario
    Crema, Claudio
    Redolfi, Alberto
    Bellazzi, Riccardo
    Parimbelli, Enea
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 144
  • [24] Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
    Zhang, Minjia
    He, Yuxiong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
  • [25] Arlo: Serving Transformer-based Language Models with Dynamic Input Lengths
    Tan, Xin
    Li, Jiamin
    Yang, Yitao
    Li, Jingzong
    Xu, Hong
    53RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2024, 2024, : 367 - 376
  • [26] Enhancing Address Data Integrity using Transformer-Based Language Models
    Kurklu, Omer Faruk
    Akagiunduz, Erdem
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [27] Transformer-based models for ICD-10 coding of death certificates with Portuguese text
    Coutinho, Isabel
    Martins, Bruno
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 136
  • [28] EEG Classification with Transformer-Based Models
    Sun, Jiayao
    Xie, Jin
    Zhou, Huihui
    2021 IEEE 3RD GLOBAL CONFERENCE ON LIFE SCIENCES AND TECHNOLOGIES (IEEE LIFETECH 2021), 2021, : 92 - 93
  • [29] Quantifying the Bias of Transformer-Based Language Models for African American English in Masked Language Modeling
    Salutari, Flavia
    Ramos, Jerome
    Rahmani, Hossein A.
    Linguaglossa, Leonardo
    Lipani, Aldo
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2023, PT I, 2023, 13935 : 532 - 543
  • [30] Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation
    Naseem, Usman
    Bandi, Ajay
    Raza, Shaina
    Rashid, Junaid
    Chakravarthi, Bharathi Raja
    PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 110 - 115