END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION

被引:8
|
作者
Sharma, Roshan [1 ]
Palaskar, Shruti [1 ]
Black, Alan W. [1 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
speech summarization; end-to-end; long sequence modeling; concept learning;
D O I
10.1109/ICASSP43922.2022.9747320
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech summarization is typically performed by using a cascade of speech recognition and text summarization models. End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences. Recent work in document summarization has inspired methods to reduce the complexity of self-attentions, which enables transformer models to handle long sequences. In this work, we introduce a single model optimized end-to-end for speech summarization. We apply the restricted self-attention technique from text-based models to speech models to address the memory and compute constraints. We demonstrate that the proposed model learns to directly summarize speech for the How-2 corpus of instructional videos. The proposed end-to-end model outperforms the previously proposed cascaded model by 3 points absolute on ROUGE. Further, we consider the spoken language understanding task of predicting concepts from speech inputs and show that the proposed end-to-end model outperforms the cascade model by 4 points absolute F-1.
引用
收藏
页码:8072 / 8076
页数:5
相关论文
共 50 条
  • [41] STRUCTURED SPARSE ATTENTION FOR END-TO-END AUTOMATIC SPEECH RECOGNITION
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7044 - 7048
  • [42] Improved training of end-to-end attention models for speech recognition
    Zeyer, Albert
    Irie, Kazuki
    Schlueter, Ralf
    Ney, Hermann
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 7 - 11
  • [43] Joint CTC/attention decoding for end-to-end speech recognition
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 518 - 529
  • [44] Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization
    Jeoung, Ye-Rin
    Choi, Jeong-Hwan
    Seong, Ju-Seok
    Kyung, JeHyun
    Chang, Joon-Hyuk
    INTERSPEECH 2023, 2023, : 3197 - 3201
  • [45] Information Distance Based Self-Attention-BGRU Layer for End-to-End Speech Recognition
    Yan, Yunhao
    Yan, Qinmengying
    Hua, Guang
    Zhang, Haijian
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [46] Self-Training for End-to-End Speech Translation
    Pino, Juan
    Xu, Qiantong
    Ma, Xutai
    Dousti, Mohammad Javad
    Tang, Yun
    INTERSPEECH 2020, 2020, : 1476 - 1480
  • [47] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
    Kahn, Jacob
    Lee, Ann
    Hannun, Awni
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
  • [48] Memory Attention: Robust Alignment Using Gating Mechanism for End-to-End Speech Synthesis
    Lee, Joun Yeop
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Nam Soo
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 2004 - 2008
  • [49] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
  • [50] End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform
    Lee, Hyeonseung
    Kim, Hyung Yong
    Kang, Woo Hyun
    Kim, Jeunghun
    Kim, Nam Soo
    INTERSPEECH 2019, 2019, : 4285 - 4289