Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition

被引:0
|
作者
Zhang, Xinghua [1 ,2 ]
Chen, Gaode [1 ,2 ]
Cui, Shiyao [1 ,2 ]
Sheng, Jiawei [1 ,2 ]
Liu, Tingwen [1 ,2 ]
Xu, Hongbo [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
关键词
Knowledge Acquisition; Data Augmentation; Named Entity Recognition; Low-resource learning;
D O I
10.1145/3626772.3657754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Low-resource Complex Named Entity Recognition aims to detect entities with the form of any linguistic constituent under scenarios with limited manually annotated data. Existing studies augment the text through the substitution of same type entities or language modeling, but suffer from the lower quality and the limited entity context patterns within low-resource corpora. In this paper, we propose a novel data augmentation method E(2)DA from both exogenous and endogenous perspectives. As for exogenous augmentation, we treat the limited manually annotated data as anchors, and leverage the powerful instruction-following capabilities of Large Language Models (LLMs) to expand the anchors by generating data that are highly dissimilar from the original anchor texts in terms of entity mentions and contexts. As regards the endogenous augmentation, we explore diverse semantic directions in the implicit feature space of the original and expanded anchors for effective data augmentation. Our complementary augmentation method from two perspectives not only continuously expands the global text-level space, but also fully explores the local semantic space for more diverse data augmentation. Extensive experiments on 10 diverse datasets across various low-resource settings demonstrate that the proposed method excels significantly over prior state-of-the-art data augmentation methods.
引用
收藏
页码:630 / 640
页数:11
相关论文
共 50 条
  • [41] Data Augmentation for Low-Resource Keyphrase Generation
    Garg, Krishna
    Chowdhury, Jishnu Ray
    Caragea, Cornelia
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8442 - 8455
  • [42] Entity-to-Text based Data Augmentation for various Named Entity Recognition Tasks
    Hu, Xuming
    Jiang, Yong
    Liu, Aiwei
    Huang, Zhongqiang
    Xie, Pengjun
    Huang, Fei
    Wen, Lijie
    Yu, Philip S.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9072 - 9087
  • [43] ALDANER: Active Learning based Data Augmentation for Named Entity Recognition
    Moscato, Vincenzo
    Postiglione, Marco
    Sperli, Giancarlo
    Vignali, Andrea
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [44] Label-Guided Data Augmentation for Chinese Named Entity Recognition
    Jiang, Miao
    Chen, Honghui
    APPLIED SCIENCES-BASEL, 2025, 15 (05):
  • [45] Weakly labeled data augmentation for social media named entity recognition
    Kim, Juae
    Kim, Yejin
    Kang, Sangwoo
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 209
  • [46] Enhancing Low-resource Fine-grained Named Entity Recognition by Leveraging Coarse-grained Datasets
    Lee, Su Ah
    Oh, Seokjin
    Jung, Woohwan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3269 - 3279
  • [47] Widaug. Data augmentation for named entity recognition using Wikidata
    Calleja, Pablo
    Sanchez, Alberto
    Corcho, Oscar
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (70): : 145 - 155
  • [48] Data Augmentation for Low-Resource Quechua ASR Improvement
    Zevallos, Rodolfo
    Bel, Nuria
    Cambara, Guillermo
    Farrus, Mireia
    Luque, Jordi
    INTERSPEECH 2022, 2022, : 3518 - 3522
  • [49] SYNTHETIC DATA AUGMENTATION FOR IMPROVING LOW-RESOURCE ASR
    Thai, Bao
    Jimerson, Robert
    Arcoraci, Dominic
    Prud'hommeaux, Emily
    Ptucha, Raymond
    2019 IEEE WESTERN NEW YORK IMAGE AND SIGNAL PROCESSING WORKSHOP (WNYISPW), 2019,
  • [50] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573