Attention Temperature Matters in Abstractive Summarization Distillation

被引：0

作者：

Zhang, Shengqiang ^{[1
]}

Zhang, Xingxing ^{[2
]}

Bao, Hangbo ^{[2
]}

Wei, Furu ^{[2
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress of abstractive text summarization largely relies on large pre-trained sequence-to-sequence Transformer models, which are computationally expensive. This paper aims to distill these large models into smaller ones for faster inference and with minimal performance loss. Pseudo-labeling based methods are popular in sequence-to-sequence model distillation. In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models. Our experiments on three summarization datasets show our proposed method consistently improves vanilla pseudo-labeling based methods. Further empirical analysis shows that both pseudo labels and summaries produced by our students are shorter and more abstractive. Our code is available at https://github.com/Shengqiang-Zhang/plate.

引用

页码：127 / 141

页数：15

共 50 条

[21] Controllable Abstractive Summarization
Fan, Angela
Grangier, David
Auli, Michael
NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 45 - 54
[22] Diversity driven attention model for query-based abstractive summarization
Nema, Preksha
Khapra, Mitesh M.
Laha, Anirban
Ravindran, Balaraman
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1063 - 1072
[23] Neural Attention Model for Abstractive Text Summarization Using Linguistic Feature Space
Dilawari, Aniqa
Khan, Muhammad Usman Ghani
Saleem, Summra
Zahoor-Ur-Rehman
Shaikh, Fatema Sabeen
IEEE ACCESS, 2023, 11 : 23557 - 23564
[24] An Abstractive Summarization Model Based on Joint-Attention Mechanism and a Priori Knowledge
Li, Yuanyuan
Huang, Yuan
Huang, Weijian
Yu, Junhao
Huang, Zheng
APPLIED SCIENCES-BASEL, 2023, 13 (07):
[25] See, hear, read: Leveraging multimodality with guided attention for abstractive text summarization
Atri, Yash Kumar
Pramanick, Shraman
Goyal, Vikram
Chakraborty, Tanmoy
KNOWLEDGE-BASED SYSTEMS, 2021, 227
[26] Extractive-Abstractive Summarization of Judgment Documents Using Multiple Attention Networks
Gao, Yan
Liu, Zhengtao
Li, Juan
Guo, Fan
Xiao, Fei
LOGIC AND ARGUMENTATION, CLAR 2021, 2021, 13040 : 486 - 494
[27] Summary-aware attention for social media short text abstractive summarization
Wang, Qianlong
Ren, Jiangtao
NEUROCOMPUTING, 2021, 425 : 290 - 299
[28] KAAS: A Keyword-Aware Attention Abstractive Summarization Model for Scientific Articles
Li, Shuaimin
Xu, Jungang
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 263 - 271
[29] Abstractive Event Summarization on Twitter
Li, Quanzhi
Zhang, Qiong
WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 22 - 23
[30] Source Identification in Abstractive Summarization
Suhara, Yoshi
Alikaniotis, Dimitris
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 212 - 224

← 1 2 3 4 5 →