Attention Temperature Matters in Abstractive Summarization Distillation

被引:0
|
作者
Zhang, Shengqiang [1 ]
Zhang, Xingxing [2 ]
Bao, Hangbo [2 ]
Wei, Furu [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress of abstractive text summarization largely relies on large pre-trained sequence-to-sequence Transformer models, which are computationally expensive. This paper aims to distill these large models into smaller ones for faster inference and with minimal performance loss. Pseudo-labeling based methods are popular in sequence-to-sequence model distillation. In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models. Our experiments on three summarization datasets show our proposed method consistently improves vanilla pseudo-labeling based methods. Further empirical analysis shows that both pseudo labels and summaries produced by our students are shorter and more abstractive. Our code is available at https://github.com/Shengqiang-Zhang/plate.
引用
收藏
页码:127 / 141
页数:15
相关论文
共 50 条
  • [1] Neural Abstractive Summarization with Structural Attention
    Chowdhury, Tanya
    Kumar, Sachin
    Chakraborty, Tanmoy
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3716 - 3722
  • [2] Attention Optimization for Abstractive Document Summarization
    Gui, Min
    Tian, Junfeng
    Wang, Rui
    Yang, Zhenglu
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1222 - 1228
  • [3] Attention based Abstractive Summarization of Malayalam Document
    Nambiar, Sindhya K.
    Peter, David S.
    Idicula, Sumam Mary
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 250 - 257
  • [4] Contrastive Attention Mechanism for Abstractive Sentence Summarization
    Duan, Xiangyu
    Yu, Hongfei
    Yin, Mingming
    Zhang, Min
    Luo, Weihua
    Zhang, Yue
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3044 - 3053
  • [5] Abstractive Summarization with Keyword and Generated Word Attention
    Wang, Qianlong
    Ren, Jiangtao
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [6] Attention History-based Attention for Abstractive Text Summarization
    Lee, Hyunsoo
    Choi, YunSeok
    Lee, Jee-Hyong
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1075 - 1081
  • [7] Enhancing abstractive summarization of implicit datasets with contrastive attention
    Kwon S.
    Lee Y.
    Neural Computing and Applications, 2024, 36 (25) : 15337 - 15351
  • [8] Abstractive Text Summarization Using Enhanced Attention Model
    Roul, Rajendra Kumar
    Joshi, Pratik Madhav
    Sahoo, Jajati Keshari
    INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 63 - 76
  • [9] Introducing bidirectional attention for autoregressive models in abstractive summarization
    Zhao, Jianfei
    Sun, Xin
    Feng, Chong
    INFORMATION SCIENCES, 2025, 689
  • [10] Abstractive Text Summarization with Multi-Head Attention
    Li, Jinpeng
    Zhang, Chuang
    Chen, Xiaojun
    Cao, Yanan
    Liao, Pengcheng
    Zhang, Peng
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,