Attention Temperature Matters in Abstractive Summarization Distillation

被引：0

作者：

Zhang, Shengqiang ^{[1
]}

Zhang, Xingxing ^{[2
]}

Bao, Hangbo ^{[2
]}

Wei, Furu ^{[2
]}

机构：

[1] Peking Univ, Beijing, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent progress of abstractive text summarization largely relies on large pre-trained sequence-to-sequence Transformer models, which are computationally expensive. This paper aims to distill these large models into smaller ones for faster inference and with minimal performance loss. Pseudo-labeling based methods are popular in sequence-to-sequence model distillation. In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models. Our experiments on three summarization datasets show our proposed method consistently improves vanilla pseudo-labeling based methods. Further empirical analysis shows that both pseudo labels and summaries produced by our students are shorter and more abstractive. Our code is available at https://github.com/Shengqiang-Zhang/plate.

引用

页码：127 / 141

页数：15

共 50 条

[41] Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism
Argade, Dakshata
Khairnar, Vaishali
Vora, Deepali
Patil, Shruti
Kotecha, Ketan
Alfarhood, Sultan
HELIYON, 2024, 10 (04)
[42] A Convolution-Self Attention Abstractive Summarization Method Fusing Sequential Grammar Knowledge
Luo S.
Wang R.
Wu Q.
Pan L.
Wu Z.
Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2021, 41 (01): : 93 - 101
[43] An abstractive text summarization technique using transformer model with self-attention mechanism
Sandeep Kumar
Arun Solanki
Neural Computing and Applications, 2023, 35 : 18603 - 18622
[44] Boundary-Aware Abstractive Summarization with Entity-Augmented Attention for Enhancing Faithfulness
Li, Jiuyi
Liu, Junpeng
Ma, Jianjun
Yang, Wei
Huang, Degen
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
[45] An abstractive text summarization technique using transformer model with self-attention mechanism
Kumar, Sandeep
Solanki, Arun
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18603 - 18622
[46] Bottom-Up Abstractive Summarization
Gehrmann, Sebastian
Deng, Yuntian
Rush, Alexander M.
2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4098 - 4109
[47] Reducing repetition in convolutional abstractive summarization
Liu, Yizhu
Chen, Xinyue
Luo, Xusheng
Zhu, Kenny Q.
NATURAL LANGUAGE ENGINEERING, 2023, 29 (01) : 81 - 109
[48] Abstractive Summarization Model with Adaptive Sparsemax
Guo, Shiqi
Si, Yumeng
Zhao, Jing
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 810 - 821
[49] Abstractive Summarization: A Survey of the State of the Art
Lin, Hui
Ng, Vincent
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9815 - 9822
[50] Learning Cluster Patterns for Abstractive Summarization
Jo, Sung-Guk
Park, Seung-Hyeok
Kim, Jeong-Jae
On, Byung-Won
IEEE ACCESS, 2023, 11 : 146065 - 146075

← 1 2 3 4 5 →