Unifying Cross-lingual Summarization and Machine Translation with Compression Rate

被引:5
|
作者
Bai, Yu [1 ,2 ]
Huang, Heyan [1 ,3 ]
Fan, Kai [4 ]
Gao, Yang [1 ]
Zhu, Yiming [1 ]
Zhan, Jiaao [1 ]
Chi, Zewen [1 ]
Chen, Boxing [4 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China
[3] Southeast Acad Informat Technol, Putian, Fujian, Peoples R China
[4] Alibaba DAMO Acad, Machine Intelligence Technol Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-lingual Summarization; Machine Translation; Compression Rate;
D O I
10.1145/3477495.3532071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine Translation are far more than that for cross-lingual and monolingual summarization. Thus incorporating the Machine Translation corpus into CLS would be beneficial for its performance. However, the present work only leverages a simple multi-task framework to bring Machine Translation in, lacking deeper exploration. In this paper, we propose a novel task, Cross-lingual Summarization with Compression rate (CSC), to benefit Cross-Lingual Summarization by large-scale Machine Translation corpus. Through introducing compression rate, the information ratio between the source and the target text, we regard the MT task as a special CLS task with a compression rate of 100%. Hence they can be trained as a unified task, sharing knowledge more effectively. However, a huge gap exists between the MT task and the CLS task, where samples with compression rates between 30% and 90% are extremely rare. Hence, to bridge these two tasks smoothly, we propose an effective data augmentation method to produce document-summary pairs with different compression rates. The proposed method not only improves the performance of the CLS task, but also provides controllability to generate summaries in desired lengths. Experiments demonstrate that our method outperforms various strong baselines in three cross-lingual summarization datasets. We released our code and data at https://github.com/ybai-nlp/CLS_CIR.
引用
收藏
页码:1087 / 1097
页数:11
相关论文
共 50 条
  • [1] Towards Unifying Multi-Lingual and Cross-Lingual Summarization
    Wang, Jiaan
    Meng, Fandong
    Zheng, Duo
    Liang, Yunlong
    Li, Zhixu
    Qu, Jianfeng
    Zhou, Jie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15127 - 15143
  • [2] Unified Training for Cross-Lingual Abstractive Summarization by Aligning Parallel Machine Translation Pairs
    Cheng, Shaohuan
    Chen, Wenyu
    Tang, Yujia
    Fu, Mingsheng
    Qu, Hong
    MATHEMATICS, 2024, 12 (13)
  • [3] Cross-Lingual Summarization of Speech-to-Speech Translation: A Baseline
    Karande, Pranav
    Sarkar, Balaram
    Maurya, Chandresh Kumar
    SPEECH AND COMPUTER, SPECOM 2024, PT I, 2025, 15299 : 119 - 133
  • [4] Cross-lingual timeline summarization
    Cagliero, Luca
    La Quatra, Moreno
    Garza, Paolo
    Baralis, Elena
    2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 45 - 53
  • [5] Cross-Lingual Preposition Disambiguation for Machine Translation
    Kumar, M. Anand
    Rajendran, S.
    Soman, K. P.
    ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 291 - 300
  • [6] A Survey on Cross-Lingual Summarization
    Wang, Jiaan
    Meng, Fandong
    Zheng, Duo
    Liang, Yunlong
    Li, Zhixu
    Qu, Jianfeng
    Zhou, Jie
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2022, 10 : 1304 - 1323
  • [7] Revisiting Machine Translation for Cross-lingual Classification
    Artetxe, Mikel
    Goswami, Vedanuj
    Bhosale, Shruti
    Fan, Angela
    Zettlemoyer, Luke
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6489 - 6499
  • [8] NCLS: Neural Cross-Lingual Summarization
    Zhu, Junnan
    Wang, Qian
    Wang, Yining
    Zhou, Yu
    Zhang, Jiajun
    Wang, Shaonan
    Zong, Chengqing
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3054 - 3064
  • [9] Review of Research on Cross-Lingual Summarization
    Zheng, Bofei
    Yun, Jing
    Liu, Limin
    Jiao, Lei
    Yuan, Jingshu
    Computer Engineering and Applications, 2023, 59 (13) : 49 - 60
  • [10] Evaluating Factuality in Cross-lingual Summarization
    Gao, Mingqi
    Wang, Wenqing
    Wan, Xiaojun
    Xu, Yuemei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12415 - 12431