Augmenting a Spanish clinical dataset for transformer-based linking of negations and their out-of-scope references

被引:0
|
作者
Tamayo-Herrera, Antonio Jesus [1 ]
Burgos, Diego A. [2 ]
Gelbukh, Alexander [1 ]
机构
[1] Ctr Invest Comp, Inst Politecn Nacl, Mexico City, Mexico
[2] Wake Forest Univ, Winston Salem, NC USA
来源
NATURAL LANGUAGE PROCESSING | 2025年 / 31卷 / 01期
关键词
machine translation; evaluation; MACHINE-LEARNING APPROACH; GOLD-STANDARD CORPUS; SPECULATION DETECTION; RECOGNITION; DOCUMENTS;
D O I
10.1017/nlp.2024.10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A negated statement consists of three main components: the negation cue, the negation scope, and the negation reference. The negation cue is the indicator of negation, while the negation scope defines the extent of the negation. The negation reference, which may or may not be within the negation scope, is the part of the statement being negated. Although there has been considerable research on the negation cue and scope, little attention has been given to identifying negation references outside the scope, even though they make up almost half of all negations. In this study, our goal is to identify out-of-scope references (OSRs) to restore the meaning of truncated negated statements identified by negation detection systems. To achieve this, we augment the largest available Spanish clinical dataset by adding annotations for OSRs. Additionally, we fine-tune five robust BERT-based models using transfer learning to address negation detection, uncertainty detection, and OSR identification and linking with their respective negation scopes. Our best model achieves state-of-the-art performance in negation detection while also establishing a competitive baseline for OSR identification (Macro F1 = 0.56) and linking (Macro F1 = 0.86). We support these findings with relevant statistics from the newly annotated dataset and an extensive review of existing literature.
引用
收藏
页码:56 / 89
页数:34
相关论文
empty
未找到相关数据