Attention Temperature Matters in Abstractive Summarization Distillation

被引:0
|
作者
Zhang, Shengqiang [1 ]
Zhang, Xingxing [2 ]
Bao, Hangbo [2 ]
Wei, Furu [2 ]
机构
[1] Peking Univ, Beijing, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent progress of abstractive text summarization largely relies on large pre-trained sequence-to-sequence Transformer models, which are computationally expensive. This paper aims to distill these large models into smaller ones for faster inference and with minimal performance loss. Pseudo-labeling based methods are popular in sequence-to-sequence model distillation. In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models. Our experiments on three summarization datasets show our proposed method consistently improves vanilla pseudo-labeling based methods. Further empirical analysis shows that both pseudo labels and summaries produced by our students are shorter and more abstractive. Our code is available at https://github.com/Shengqiang-Zhang/plate.
引用
收藏
页码:127 / 141
页数:15
相关论文
共 50 条
  • [31] Global Encoding for Abstractive Summarization
    Lin, Junyang
    Sun, Xu
    Ma, Shuming
    Su, Qi
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 163 - 169
  • [32] An approach to Abstractive Text Summarization
    Huong Thanh Le
    Tien Manh Le
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 371 - 376
  • [33] A Survey on Abstractive Text Summarization
    Moratanch, N.
    Chitrakala, S.
    PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [34] Abstractive text summarization for Hungarian
    Yang, Zijian Gyozo
    Agocs, Adam
    Kusper, Gabor
    Varadi, Tamas
    ANNALES MATHEMATICAE ET INFORMATICAE, 2021, 53 : 299 - 316
  • [35] On Faithfulness and Factuality in Abstractive Summarization
    Maynez, Joshua
    Narayan, Shashi
    Bohnet, Bernd
    McDonald, Ryan
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1906 - 1919
  • [36] A Survey on Abstractive Summarization Techniques
    Rachabathuni, Pavan Kartheek
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 762 - 765
  • [37] Abstractive Meeting Summarization: A Survey
    Rennard, Virgile
    Shang, Guokan
    Hunter, Julie
    Vazirgiannis, Michalis
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 861 - 884
  • [38] Survey on Abstractive Text Summarization
    Raphal, Nithin
    Duwarah, Hemanta
    Daniel, Philemon
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 513 - 517
  • [39] Abstractive Text Summarization Using Hybrid Technique of Summarization
    Liaqat, Muhammad Irfan
    Hamid, Isma
    Nawaz, Qamar
    Shafique, Nida
    2022 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2022), 2022, : 141 - 144
  • [40] Abstractive text summarization model combining a hierarchical attention mechanism and multiobjective reinforcement learning
    Sun, Yujia
    Platos, Jan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248