END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION

被引:8
|
作者
Sharma, Roshan [1 ]
Palaskar, Shruti [1 ]
Black, Alan W. [1 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
speech summarization; end-to-end; long sequence modeling; concept learning;
D O I
10.1109/ICASSP43922.2022.9747320
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech summarization is typically performed by using a cascade of speech recognition and text summarization models. End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences. Recent work in document summarization has inspired methods to reduce the complexity of self-attentions, which enables transformer models to handle long sequences. In this work, we introduce a single model optimized end-to-end for speech summarization. We apply the restricted self-attention technique from text-based models to speech models to address the memory and compute constraints. We demonstrate that the proposed model learns to directly summarize speech for the How-2 corpus of instructional videos. The proposed end-to-end model outperforms the previously proposed cascaded model by 3 points absolute on ROUGE. Further, we consider the spoken language understanding task of predicting concepts from speech inputs and show that the proposed end-to-end model outperforms the cascade model by 4 points absolute F-1.
引用
收藏
页码:8072 / 8076
页数:5
相关论文
共 50 条
  • [31] Reinforcement-Tracking: An End-to-End Trajectory Tracking Method Based on Self-Attention Mechanism
    Zhao, Guanglei
    Chen, Zihao
    Liao, Weiming
    INTERNATIONAL JOURNAL OF AUTOMOTIVE TECHNOLOGY, 2024, 25 (03) : 541 - 551
  • [32] Reinforcement-Tracking: An End-to-End Trajectory Tracking Method Based on Self-Attention Mechanism
    Guanglei Zhao
    Zihao Chen
    Weiming Liao
    International Journal of Automotive Technology, 2024, 25 : 541 - 551
  • [33] DensSiam: End-to-End Densely-Siamese Network with Self-Attention Model for Object Tracking
    Abdelpakey, Mohamed H.
    Shehata, Mohamed S.
    Mohamed, Mostafa M.
    ADVANCES IN VISUAL COMPUTING, ISVC 2018, 2018, 11241 : 463 - 473
  • [34] An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention
    Hong, Yong
    Li, Deren
    Luo, Shupei
    Chen, Xin
    Yang, Yi
    Wang, Mi
    REMOTE SENSING, 2022, 14 (24)
  • [35] End-to-end recognition of streaming Japanese speech using CTC and local attention
    Chen, Jiahao
    Nishimura, Ryota
    Kitaoka, Norihide
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9 (01)
  • [36] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization
    Zhou, Haodong
    Li, Tao
    Wang, Jie
    Li, Lin
    Hong, Qingyang
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
  • [37] A Novel End-to-end Network Based on a bidirectional GRU and a Self-Attention Mechanism for Denoising of Electroencephalography Signals
    Wang, Wenlong
    Li, Baojiang
    Wang, Haiyan
    NEUROSCIENCE, 2022, 505 : 10 - 20
  • [38] An End-to-end Speech Recognition Algorithm based on Attention Mechanism
    Chen, Jia-nan
    Gao, Shuang
    Sun, Han-zhe
    Liu, Xiao-hui
    Wang, Zi-ning
    Zheng, Yan
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 2935 - 2940
  • [39] Hybrid CTC/Attention Architecture for End-to-End Speech Recognition
    Watanabe, Shinji
    Hori, Takaaki
    Kim, Suyoun
    Hershey, John R.
    Hayashi, Tomoki
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1240 - 1253
  • [40] Multi-channel Attention for End-to-End Speech Recognition
    Braun, Stefan
    Neil, Daniel
    Anumula, Jithendar
    Ceolini, Enea
    Liu, Shih-Chii
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 17 - 21