Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

被引:0
|
作者
Huang, Yuxin [1 ,2 ]
Gu, Huailing [1 ,2 ]
Yu, Zhengtao [1 ,2 ]
Gao, Yumeng [1 ,2 ]
Pan, Tong [1 ,2 ]
Xu, Jialong [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650504, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual summarization; Low-resource language; Noisy data; Fine-grained reinforcement learning; Word correlation; Word missing degree; TP391;
D O I
10.1631/FITEE.2300296
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.
引用
收藏
页码:121 / 134
页数:14
相关论文
共 50 条
  • [21] Design Challenges in Low-resource Cross-lingual Entity Linking
    Fu, Xingyu
    Shi, Weijia
    Yu, Xiaodong
    Zhao, Zian
    Roth, Dan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6418 - 6432
  • [22] Cross-Lingual Word Embeddings for Low-Resource Language Modeling
    Adams, Oliver
    Makarucha, Adam
    Neubig, Graham
    Bird, Steven
    Cohn, Trevor
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 937 - 947
  • [23] Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
    Lin, Pin-Jie
    Saeed, Muhammed
    Chang, Ernie
    Scholman, Merel
    INTERSPEECH 2023, 2023, : 3954 - 3958
  • [24] Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages
    Nie, Ercong
    Liang, Sheng
    Schmid, Helmut
    Schuetze, Hinrich
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8320 - 8340
  • [25] Enhancing Low-resource Fine-grained Named Entity Recognition by Leveraging Coarse-grained Datasets
    Lee, Su Ah
    Oh, Seokjin
    Jung, Woohwan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3269 - 3279
  • [26] LEARNING CROSS-LINGUAL INFORMATION WITH MULTILINGUAL BLSTM FOR SPEECH SYNTHESIS OF LOW-RESOURCE LANGUAGES
    Yu, Quanjie
    Liu, Peng
    Wu, Zhiyong
    Kang, Shiyin
    Meng, Helen
    Cai, Lianhong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5545 - 5549
  • [27] ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion
    Casanova, Edresson
    Shulby, Christopher
    Korolev, Alexander
    Candido Junior, Arnaldo
    Soares, Anderson da Silva
    Aluisio, Sandra
    Ponti, Moacir Antonelli
    INTERSPEECH 2023, 2023, : 1244 - 1248
  • [28] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Zolzaya Byambadorj
    Ryota Nishimura
    Altangerel Ayush
    Kengo Ohta
    Norihide Kitaoka
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [29] Cross-lingual entity alignment based on complex relationships and fine-grained attributes
    Zhu, Beibei
    Tian, Ruijie
    Yuan, Osong
    Han, Ridong
    Yang, Yan
    Fu, Bo
    APPLIED SOFT COMPUTING, 2025, 172
  • [30] Text-to-speech system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)