Enhancing low-resource cross-lingual summarization from noisy data with fine-grained reinforcement learning

被引:0
|
作者
Huang, Yuxin [1 ,2 ]
Gu, Huailing [1 ,2 ]
Yu, Zhengtao [1 ,2 ]
Gao, Yumeng [1 ,2 ]
Pan, Tong [1 ,2 ]
Xu, Jialong [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650504, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual summarization; Low-resource language; Noisy data; Fine-grained reinforcement learning; Word correlation; Word missing degree; TP391;
D O I
10.1631/FITEE.2300296
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-lingual summarization (CLS) is the task of generating a summary in a target language from a document in a source language. Recently, end-to-end CLS models have achieved impressive results using large-scale, high-quality datasets typically constructed by translating monolingual summary corpora into CLS corpora. However, due to the limited performance of low-resource language translation models, translation noise can seriously degrade the performance of these models. In this paper, we propose a fine-grained reinforcement learning approach to address low-resource CLS based on noisy data. We introduce the source language summary as a gold signal to alleviate the impact of the translated noisy target summary. Specifically, we design a reinforcement reward by calculating the word correlation and word missing degree between the source language summary and the generated target language summary, and combine it with cross-entropy loss to optimize the CLS model. To validate the performance of our proposed model, we construct Chinese-Vietnamese and Vietnamese-Chinese CLS datasets. Experimental results show that our proposed model outperforms the baselines in terms of both the ROUGE score and BERTScore.
引用
收藏
页码:121 / 134
页数:14
相关论文
共 50 条
  • [31] Robust learning from noisy web data for fine-Grained recognition
    Cai, Zhenhuang
    Xie, Guo-Sen
    Huang, Xingguo
    Huang, Dan
    Yao, Yazhou
    Tang, Zhenmin
    PATTERN RECOGNITION, 2023, 134
  • [32] Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
    Zhou, Shuyan
    Rijhwani, Shruti
    Wieting, John
    Carbonell, Jaime
    Neubig, Graham
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 109 - 124
  • [33] Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
    Zhang, Mozhi
    Fujinuma, Yoshinari
    Boyd-Graber, Jordan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9547 - 9554
  • [34] Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
    Effland, Thomas
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 122 - 138
  • [35] Unsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
    Ecker, Stefan
    Horbach, Andrea
    Thater, Stefan
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1709 - 1717
  • [36] Cross-Lingual and Ensemble MLPs Strategies for Low-Resource Speech Recognition
    Qian, Yanmin
    Liu, Jia
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2581 - 2584
  • [37] Cross-lingual Sentence Embedding for Low-resource Chinese-Vietnamese Based on Contrastive Learning
    Huang, Yuxin
    Liang, Yin
    Wu, Zhaoyuan
    Zhu, Enchang
    Yu, Zhengtao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [38] Multi-speaker TTS system for low-resource language using cross-lingual transfer learning and data augmentation
    Byambadorj, Zolzaya
    Nishimura, Ryota
    Ayush, Altangerel
    Ohta, Kengo
    Kitaoka, Norihide
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 849 - 853
  • [39] Curriculum-Style Fine-Grained Adaption for Unsupervised Cross-Lingual Dependency Transfer
    Guo, Peiming
    Huang, Shen
    Jiang, Peijie
    Sun, Yueheng
    Zhang, Meishan
    Zhang, Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 322 - 332
  • [40] How Can Cross-lingual Knowledge Contribute Better to Fine-Grained Entity Typing?
    Jin, Hailong
    Dong, Tiansi
    Hou, Lei
    Li, Juanzi
    Chen, Hui
    Dai, Zelin
    Qu Yincen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3071 - 3081