CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese

被引：4

作者：

Rubel Schneider, Elisa Terumi ^{[1
]}

Gumiel, Yohan Bonescki ^{[2
]}

Andrioli de Souza, Joao Vitor ^{[3
]}

Mukai, Lilian Mie ^{[2
]}

Silva e Oliveira, Lucas Emanuel ^{[3
]}

Rebelo, Marina de Sa ^{[4
]}

Gutierrez, Marco Antonio ^{[4
]}

Krieger, Jose Eduardo ^{[4
]}

Teodoro, Douglas ^{[5
]}

Moro, Claudia ^{[1
]}

Paraiso, Emerson Cabrera ^{[1
]}

机构：

[1] Pontificia Univ Catolica Parana, Curitiba, Parana, Brazil

[2] Pontificia Univ Catolica Parana, Inst Heart, InCor, HC FMUSP, Curitiba, Parana, Brazil

[3] Comsentimento, Curitiba, Parana, Brazil

[4] HC FMUSP, InCor, Inst Heart, Sao Paulo, Brazil

[5] Univ Geneva, Geneva, Switzerland

来源：

2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年

关键词：

natural language processing; transformer; clinical texts; language model;

D O I：

10.1109/CBMS58004.2023.00247

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contextual word embeddings and the Transformers architecture have reached state-of-the-art results in many natural language processing (NLP) tasks and improved the adaptation of models for multiple domains. Despite the improvement in the reuse and construction of models, few resources are still developed for the Portuguese language, especially in the health domain. Furthermore, the clinical models available for the language are not representative enough for all medical specialties. This work explores deep contextual embedding models for the Portuguese language to support clinical NLP tasks. We transferred learned information from electronic health records of a Brazilian tertiary hospital specialized in cardiology diseases and pre-trained multiple clinical BERT-based models. We evaluated the performance of these models in named entity recognition experiments, fine-tuning them in two annotated corpora containing clinical narratives. Our pre-trained models outperformed previous multilingual and Portuguese BERT-based models for cardiology and multi-specialty environments, reaching the state-of-the-art for analyzed corpora, with 5.5% F1 score improvement in TempClinBr (all entities) and 1.7% in SemClinBr (Disorder entity) corpora. Hence, we demonstrate that data representativeness and a high volume of training data can improve the results for clinical tasks, aligned with results for other languages.

引用

页码：378 / 381

页数：4

共 50 条

[31] Task-Specific Transformer-Based Language Models in HealthCare:Scoping Review
Cho, Ha Na
Jun, Tae Joon
Kim, Young-Hak
Kang, Heejun
Ahn, Imjin
Gwon, Hansle
Kim, Yunha
Seo, Jiahn
Choi, Heejung
Kim, Minkyoung
Han, Jiye
Kee, Gaeun
Park, Seohyun
Ko, Soyoung
JMIR MEDICAL INFORMATICS, 2024, 12
[32] A Comparative Analysis of Transformer-based Protein Language Models for Remote Homology Prediction
Kabir, Anowarul
Moldwin, Asher
Shehu, Amarda
14TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, BCB 2023, 2023,
[33] Transformer-based Language Models and Homomorphic Encryption: An Intersection with BERT-tiny
Rovida, Lorenzo
Leporati, Alberto
PROCEEDINGS OF THE 10TH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS, IWSPA 2024, 2024, : 3 - 13
[34] Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization
Yu, Chong
Chen, Tao
Gan, Zhongxue
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 218 - 235
[35] Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models
Mandal, Ranju
Chen, Jinyan
Becken, Susanne
Stantic, Bela
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 340 - 350
[36] An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models
Ganiev, Amir
Chapin, Colt
de Andrade, Anderson
Liu, Chen
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 163 - 169
[37] Influence of Language Proficiency on the Readability of Review Text and Transformer-based Models for Determining Language Proficiency
Sazzed, Salim
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 881 - 886
[38] Bringing order into the realm of Transformer-based language models for artificial intelligence and law
Greco, Candida M.
Tagarelli, Andrea
ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (04) : 863 - 1010
[39] Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Aspillaga, Carlos
Carvallo, Andres
Araujo, Vladimir
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1882 - 1894
[40] Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models
Shiju, Akhil
He, Zhe
2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 163 - 169

← 1 2 3 4 5 →