Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition

被引：0

作者：

Zhang, Xinghua ^{[1
,2
]}

Chen, Gaode ^{[1
,2
]}

Cui, Shiyao ^{[1
,2
]}

Sheng, Jiawei ^{[1
,2
]}

Liu, Tingwen ^{[1
,2
]}

Xu, Hongbo ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

关键词：

Knowledge Acquisition; Data Augmentation; Named Entity Recognition; Low-resource learning;

D O I：

10.1145/3626772.3657754

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Low-resource Complex Named Entity Recognition aims to detect entities with the form of any linguistic constituent under scenarios with limited manually annotated data. Existing studies augment the text through the substitution of same type entities or language modeling, but suffer from the lower quality and the limited entity context patterns within low-resource corpora. In this paper, we propose a novel data augmentation method E(2)DA from both exogenous and endogenous perspectives. As for exogenous augmentation, we treat the limited manually annotated data as anchors, and leverage the powerful instruction-following capabilities of Large Language Models (LLMs) to expand the anchors by generating data that are highly dissimilar from the original anchor texts in terms of entity mentions and contexts. As regards the endogenous augmentation, we explore diverse semantic directions in the implicit feature space of the original and expanded anchors for effective data augmentation. Our complementary augmentation method from two perspectives not only continuously expands the global text-level space, but also fully explores the local semantic space for more diverse data augmentation. Extensive experiments on 10 diverse datasets across various low-resource settings demonstrate that the proposed method excels significantly over prior state-of-the-art data augmentation methods.

引用

页码：630 / 640

页数：11

共 50 条

[41] Data Augmentation for Low-Resource Keyphrase Generation
Garg, Krishna
Chowdhury, Jishnu Ray
Caragea, Cornelia
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8442 - 8455
[42] Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks
Hu, Xuming
Jiang, Yong
Liu, Aiwei
Huang, Zhongqiang
Xie, Pengjun
Huang, Fei
Wen, Lijie
Yu, Philip S.
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9072 - 9087
[43] ALDANER: Active Learning based Data Augmentation for Named Entity Recognition
Moscato, Vincenzo
Postiglione, Marco
Sperli, Giancarlo
Vignali, Andrea
KNOWLEDGE-BASED SYSTEMS, 2024, 305
[44] Label-Guided Data Augmentation for Chinese Named Entity Recognition
Jiang, Miao
Chen, Honghui
APPLIED SCIENCES-BASEL, 2025, 15 (05):
[45] Weakly labeled data augmentation for social media named entity recognition
Kim, Juae
Kim, Yejin
Kang, Sangwoo
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
[46] Enhancing Low-resource Fine-grained Named Entity Recognition by Leveraging Coarse-grained Datasets
Lee, Su Ah
Oh, Seokjin
Jung, Woohwan
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3269 - 3279
[47] Widaug. Data augmentation for named entity recognition using Wikidata
Calleja, Pablo
Sanchez, Alberto
Corcho, Oscar
PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 145 - 155
[48] Data Augmentation for Low-Resource Quechua ASR Improvement
Zevallos, Rodolfo
Bel, Nuria
Cambara, Guillermo
Farrus, Mireia
Luque, Jordi
INTERSPEECH 2022, 2022, : 3518 - 3522
[49] SYNTHETIC DATA AUGMENTATION FOR IMPROVING LOW-RESOURCE ASR
Thai, Bao
Jimerson, Robert
Arcoraci, Dominic
Prud'hommeaux, Emily
Ptucha, Raymond
2019 IEEE WESTERN NEW YORK IMAGE AND SIGNAL PROCESSING WORKSHOP (WNYISPW), 2019,
[50] Data Augmentation for Low-Resource Neural Machine Translation
Fadaee, Marzieh
Bisazza, Arianna
Monz, Christof
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573

← 1 2 3 4 5 →