Unifying Cross-lingual Summarization and Machine Translation with Compression Rate

被引:5
|
作者
Bai, Yu [1 ,2 ]
Huang, Heyan [1 ,3 ]
Fan, Kai [4 ]
Gao, Yang [1 ]
Zhu, Yiming [1 ]
Zhan, Jiaao [1 ]
Chi, Zewen [1 ]
Chen, Boxing [4 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
[2] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China
[3] Southeast Acad Informat Technol, Putian, Fujian, Peoples R China
[4] Alibaba DAMO Acad, Machine Intelligence Technol Lab, Hangzhou, Peoples R China
来源
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22) | 2022年
基金
中国国家自然科学基金;
关键词
Cross-lingual Summarization; Machine Translation; Compression Rate;
D O I
10.1145/3477495.3532071
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine Translation are far more than that for cross-lingual and monolingual summarization. Thus incorporating the Machine Translation corpus into CLS would be beneficial for its performance. However, the present work only leverages a simple multi-task framework to bring Machine Translation in, lacking deeper exploration. In this paper, we propose a novel task, Cross-lingual Summarization with Compression rate (CSC), to benefit Cross-Lingual Summarization by large-scale Machine Translation corpus. Through introducing compression rate, the information ratio between the source and the target text, we regard the MT task as a special CLS task with a compression rate of 100%. Hence they can be trained as a unified task, sharing knowledge more effectively. However, a huge gap exists between the MT task and the CLS task, where samples with compression rates between 30% and 90% are extremely rare. Hence, to bridge these two tasks smoothly, we propose an effective data augmentation method to produce document-summary pairs with different compression rates. The proposed method not only improves the performance of the CLS task, but also provides controllability to generate summaries in desired lengths. Experiments demonstrate that our method outperforms various strong baselines in three cross-lingual summarization datasets. We released our code and data at https://github.com/ybai-nlp/CLS_CIR.
引用
收藏
页码:1087 / 1097
页数:11
相关论文
共 50 条
  • [31] Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation
    Siddhant, Aditya
    Johnson, Melvin
    Tsai, Henry
    Ari, Naveen
    Riesa, Jason
    Bapna, Ankur
    Firat, Orhan
    Raman, Karthik
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8854 - 8861
  • [32] TeacherSim: Cross-lingual Machine Translation Evaluation with Monolingual Embedding as Teacher
    Yang, Hao
    Zhang, Min
    Tao, Shimin
    Ma, Miaomiao
    Qin, Ying
    Wei, Daimeng
    2023 25TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, ICACT, 2023, : 283 - 287
  • [33] Evaluating Effects of Machine Translation Accuracy on Cross-Lingual Patent Retrieval
    Fujii, Atsushi
    Utiyama, Masao
    Yamamoto, Mikio
    Utsuro, Takehito
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 674 - 675
  • [34] A Variational Hierarchical Model for Neural Cross-Lingual Summarization
    Liang, Yunlong
    Meng, Fandong
    Zhou, Chulun
    Xu, Jinan
    Chen, Yufeng
    Su, Jinsong
    Zhou, Jie
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2088 - 2099
  • [35] CAKES: Cross-lingual Wikipedia Knowledge Enrichment and Summarization
    Fionda, Valeria
    Pirro, Giuseppe
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 901 - 902
  • [36] Cross-Lingual Korean Speech-to-Text Summarization
    Yoon, HyoJeon
    Dinh Tuyen Hoang
    Ngoc Thanh Nguyen
    Hwang, Dosam
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 198 - 206
  • [37] clstk: The Cross-Lingual Summarization Tool-Kit
    Jhaveri, Nisarg
    Gupta, Manish
    Varma, Vasudeva
    PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 766 - 769
  • [38] Cross-Lingual Machine Reading Comprehension
    Cui, Yiming
    Che, Wanxiang
    Liu, Ting
    Qin, Bing
    Wang, Shijin
    Hu, Guoping
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1586 - 1595
  • [39] Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
    Zhang, Ran
    Ouni, Jihed
    Eger, Steffen
    COMPUTATIONAL LINGUISTICS, 2024, 50 (03) : 1001 - 1047
  • [40] Cross-lingual question answering using off-the-shelf machine translation
    Ahn, K
    Alex, B
    Bos, J
    Dalmas, T
    Leidner, JL
    Smillie, MB
    MULTILINGUAL INFORMATION ACCESS FOR TEXT, SPEECH AND IMAGES, 2005, 3491 : 446 - 457