END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION

被引:8
|
作者
Sharma, Roshan [1 ]
Palaskar, Shruti [1 ]
Black, Alan W. [1 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
speech summarization; end-to-end; long sequence modeling; concept learning;
D O I
10.1109/ICASSP43922.2022.9747320
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech summarization is typically performed by using a cascade of speech recognition and text summarization models. End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences. Recent work in document summarization has inspired methods to reduce the complexity of self-attentions, which enables transformer models to handle long sequences. In this work, we introduce a single model optimized end-to-end for speech summarization. We apply the restricted self-attention technique from text-based models to speech models to address the memory and compute constraints. We demonstrate that the proposed model learns to directly summarize speech for the How-2 corpus of instructional videos. The proposed end-to-end model outperforms the previously proposed cascaded model by 3 points absolute on ROUGE. Further, we consider the spoken language understanding task of predicting concepts from speech inputs and show that the proposed end-to-end model outperforms the cascade model by 4 points absolute F-1.
引用
收藏
页码:8072 / 8076
页数:5
相关论文
共 50 条
  • [21] SELF-ATTENTION ALIGNER: A LATENCY-CONTROL END-TO-END MODEL FOR ASR USING SELF-ATTENTION NETWORK AND CHUNK-HOPPING
    Dong, Linhao
    Wang, Feng
    Xu, Bo
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5656 - 5660
  • [22] End-to-end Parking Behavior Recognition Based on Self-attention Mechanism
    Li, Penghua
    Zhu, Dechen
    Mou, Qiyun
    Tu, Yushan
    Wu, Jinfeng
    2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 371 - 376
  • [23] An End-to-End Blind Image Quality Assessment Method Using a Recurrent Network and Self-Attention
    Zhou, Mingliang
    Lan, Xuting
    Wei, Xuekai
    Liao, Xingran
    Mao, Qin
    Li, Yutong
    Wu, Chao
    Xiang, Tao
    Fang, Bin
    IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (02) : 369 - 377
  • [24] Towards End-to-End Speech-to-Text Summarization
    Monteiro, Raul
    Pernes, Diogo
    TEXT, SPEECH, AND DIALOGUE, TSD 2023, 2023, 14102 : 304 - 316
  • [25] IMPROVED END-TO-END SPOKEN UTTERANCE CLASSIFICATION WITH A SELF-ATTENTION ACOUSTIC CLASSIFIER
    Price, Ryan
    Mehrabani, Mahnoosh
    Srinivas
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8504 - 8508
  • [26] Application of an end-to-end model with self-attention mechanism in cardiac disease prediction
    Li, Li
    Chen, Xi
    Hu, Sanjun
    FRONTIERS IN PHYSIOLOGY, 2024, 14
  • [27] TRIGGERED ATTENTION FOR END-TO-END SPEECH RECOGNITION
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5666 - 5670
  • [28] Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
    Li, Yuanchao
    Zhao, Tianyu
    Kawahara, Tatsuya
    INTERSPEECH 2019, 2019, : 2803 - 2807
  • [29] A Novel End-to-End Corporate Credit Rating Model Based on Self-Attention Mechanism
    Chen, Binbin
    Long, Shengjie
    IEEE ACCESS, 2020, 8 (08): : 203876 - 203889
  • [30] An End-to-end Topic-Enhanced Self-Attention Network for Social Emotion Classification
    Wang, Chang
    Wang, Bang
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2210 - 2219