Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

被引:0
|
作者
Huang, Yuxin [1 ,2 ]
Gu, Huailing [1 ,2 ]
Yu, Zhengtao [1 ,2 ]
Gao, Yumeng [1 ,2 ]
Pan, Tong [1 ,2 ]
Xu, Jialong [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650504, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual summarization; Low-resource language; Noisy data; Fine-grained reinforcement learning; Word correlation; Word missing degree; TP391;
D O I
10.1631/FITEE.2300296
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.
引用
收藏
页码:121 / 134
页数:14
相关论文
共 50 条
  • [1] Cross-Lingual Contrastive Learning for Fine-Grained Entity Typing for Low-Resource Languages
    Han, Xu
    Luo, Yuqi
    Chen, Weize
    Liu, Zhiyuan
    Sun, Maosong
    Zhou, Botong
    Hao, Fei
    Zheng, Suncong
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2241 - 2250
  • [2] A two-stage fine-tuning method for low-resource cross-lingual summarization
    Zhang, Kaixiong
    Zhang, Yongbing
    Yu, Zhengtao
    Huang, Yuxin
    Tan, Kaiwen
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 1125 - 1143
  • [3] Cross-lingual fine-grained entity typing
    Department of Computer Science, The University of Texas, Austin, United States
    arXiv, 1600,
  • [4] CROSS-LINGUAL TRANSFER LEARNING FOR LOW-RESOURCE SPEECH TRANSLATION
    Khurana, Sameer
    Dawalatabad, Nauman
    Laurent, Antoine
    Vicente, Luis
    Gimeno, Pablo
    Mingote, Victoria
    Glass, James
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 670 - 674
  • [5] Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
    Farooq, Muhammad Umar
    Hain, Thomas
    INTERSPEECH 2023, 2023, : 5072 - 5076
  • [6] Augmenting Low-Resource Cross-Lingual Summarization with Progression-Grounded Training and Prompting
    Ma, Jiu Shun
    Huang, Yuxin
    Wang, Linqin
    Huang, Xiang
    Peng, Hao
    Yu, Zhengtao
    Yu, Philip
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (09)
  • [7] UniSplice: Universal Cross-Lingual Data Splicing for Low-Resource ASR
    Wang, Wei
    Qian, Yanmin
    INTERSPEECH 2023, 2023, : 2253 - 2257
  • [8] MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning
    Xia, Mengzhou
    Zheng, Guoqing
    Mukherjee, Subhabrata
    Shokouhi, Milad
    Neubig, Graham
    Awadallah, Ahmed Hassan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 499 - 511
  • [9] Cross-lingual text alignment for fine-grained plagiarism detection
    Ehsan, Nava
    Shakery, Azadeh
    Tompa, Frank Wm
    JOURNAL OF INFORMATION SCIENCE, 2019, 45 (04) : 443 - 459
  • [10] Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank
    Briakou, Eleftheria
    Carpuat, Marine
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1563 - 1580