Attention Temperature Matters in Abstractive Summarization Distillation

被引:0
|
作者
Zhang, Shengqiang [1 ]
Zhang, Xingxing [2 ]
Bao, Hangbo [2 ]
Wei, Furu [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress of abstractive text summarization largely relies on large pre-trained sequence-to-sequence Transformer models, which are computationally expensive. This paper aims to distill these large models into smaller ones for faster inference and with minimal performance loss. Pseudo-labeling based methods are popular in sequence-to-sequence model distillation. In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models. Our experiments on three summarization datasets show our proposed method consistently improves vanilla pseudo-labeling based methods. Further empirical analysis shows that both pseudo labels and summaries produced by our students are shorter and more abstractive. Our code is available at https://github.com/Shengqiang-Zhang/plate.
引用
收藏
页码:127 / 141
页数:15
相关论文
共 50 条
  • [41] Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism
    Argade, Dakshata
    Khairnar, Vaishali
    Vora, Deepali
    Patil, Shruti
    Kotecha, Ketan
    Alfarhood, Sultan
    HELIYON, 2024, 10 (04)
  • [42] A Convolution-Self Attention Abstractive Summarization Method Fusing Sequential Grammar Knowledge
    Luo S.
    Wang R.
    Wu Q.
    Pan L.
    Wu Z.
    Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2021, 41 (01): : 93 - 101
  • [43] An abstractive text summarization technique using transformer model with self-attention mechanism
    Sandeep Kumar
    Arun Solanki
    Neural Computing and Applications, 2023, 35 : 18603 - 18622
  • [44] Boundary-Aware Abstractive Summarization with Entity-Augmented Attention for Enhancing Faithfulness
    Li, Jiuyi
    Liu, Junpeng
    Ma, Jianjun
    Yang, Wei
    Huang, Degen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [45] An abstractive text summarization technique using transformer model with self-attention mechanism
    Kumar, Sandeep
    Solanki, Arun
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18603 - 18622
  • [46] Bottom-Up Abstractive Summarization
    Gehrmann, Sebastian
    Deng, Yuntian
    Rush, Alexander M.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4098 - 4109
  • [47] Reducing repetition in convolutional abstractive summarization
    Liu, Yizhu
    Chen, Xinyue
    Luo, Xusheng
    Zhu, Kenny Q.
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (01) : 81 - 109
  • [48] Abstractive Summarization Model with Adaptive Sparsemax
    Guo, Shiqi
    Si, Yumeng
    Zhao, Jing
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 810 - 821
  • [49] Abstractive Summarization: A Survey of the State of the Art
    Lin, Hui
    Ng, Vincent
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9815 - 9822
  • [50] Learning Cluster Patterns for Abstractive Summarization
    Jo, Sung-Guk
    Park, Seung-Hyeok
    Kim, Jeong-Jae
    On, Byung-Won
    IEEE ACCESS, 2023, 11 : 146065 - 146075