Attention Temperature Matters in Abstractive Summarization Distillation

被引:0
|
作者
Zhang, Shengqiang [1 ]
Zhang, Xingxing [2 ]
Bao, Hangbo [2 ]
Wei, Furu [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress of abstractive text summarization largely relies on large pre-trained sequence-to-sequence Transformer models, which are computationally expensive. This paper aims to distill these large models into smaller ones for faster inference and with minimal performance loss. Pseudo-labeling based methods are popular in sequence-to-sequence model distillation. In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models. Our experiments on three summarization datasets show our proposed method consistently improves vanilla pseudo-labeling based methods. Further empirical analysis shows that both pseudo labels and summaries produced by our students are shorter and more abstractive. Our code is available at https://github.com/Shengqiang-Zhang/plate.
引用
收藏
页码:127 / 141
页数:15
相关论文
共 50 条
  • [21] Controllable Abstractive Summarization
    Fan, Angela
    Grangier, David
    Auli, Michael
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 45 - 54
  • [22] Diversity driven attention model for query-based abstractive summarization
    Nema, Preksha
    Khapra, Mitesh M.
    Laha, Anirban
    Ravindran, Balaraman
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1063 - 1072
  • [23] Neural Attention Model for Abstractive Text Summarization Using Linguistic Feature Space
    Dilawari, Aniqa
    Khan, Muhammad Usman Ghani
    Saleem, Summra
    Zahoor-Ur-Rehman
    Shaikh, Fatema Sabeen
    IEEE ACCESS, 2023, 11 : 23557 - 23564
  • [24] An Abstractive Summarization Model Based on Joint-Attention Mechanism and a Priori Knowledge
    Li, Yuanyuan
    Huang, Yuan
    Huang, Weijian
    Yu, Junhao
    Huang, Zheng
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [25] See, hear, read: Leveraging multimodality with guided attention for abstractive text summarization
    Atri, Yash Kumar
    Pramanick, Shraman
    Goyal, Vikram
    Chakraborty, Tanmoy
    KNOWLEDGE-BASED SYSTEMS, 2021, 227
  • [26] Extractive-Abstractive Summarization of Judgment Documents Using Multiple Attention Networks
    Gao, Yan
    Liu, Zhengtao
    Li, Juan
    Guo, Fan
    Xiao, Fei
    LOGIC AND ARGUMENTATION, CLAR 2021, 2021, 13040 : 486 - 494
  • [27] Summary-aware attention for social media short text abstractive summarization
    Wang, Qianlong
    Ren, Jiangtao
    NEUROCOMPUTING, 2021, 425 : 290 - 299
  • [28] KAAS: A Keyword-Aware Attention Abstractive Summarization Model for Scientific Articles
    Li, Shuaimin
    Xu, Jungang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 263 - 271
  • [29] Abstractive Event Summarization on Twitter
    Li, Quanzhi
    Zhang, Qiong
    WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 22 - 23
  • [30] Source Identification in Abstractive Summarization
    Suhara, Yoshi
    Alikaniotis, Dimitris
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 212 - 224