Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

被引:2
|
作者
Li, Xueqing [1 ]
Li, Shengqiang [1 ]
Zhang, Xiao-Lei [1 ,2 ]
Rahardja, Susanto [1 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 710072, Peoples R China
[3] Singapore Inst Technol, Engn Cluster, Singapore 138683, Singapore
基金
美国国家科学基金会;
关键词
End-to-end speech translation; rotary position embedding; Transformer;
D O I
10.1109/LSP.2024.3353039
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, many Transformer-based models have been applied to end-to-end speech translation because of their capability to model global dependencies. Position embedding is crucial in Transformer models as it facilitates the modeling of dependencies between elements at various positions within the input sequence. Most position embedding methods employed in speech translation such as the absolute and relative position embedding, often encounter challenges in leveraging relative positional information or adding computational burden to the model. In this letter, we introduce a novel approach by incorporating rotary position embedding into Transformer-based speech translation (RoPE-ST). RoPE-ST first adds absolute position information by multiplying the input vector with rotation matrices, and then implements relative position embedding through the dot-product of the self-attention mechanism. The main advantage of the proposed method over the original method is that rotary position embedding combines the benefits of absolute and relative position embedding, which is suited for position embedding in speech translation tasks. We conduct experiments on a multilingual speech translation corpus MuST-C. Results show that RoPE-ST achieves an average improvement of 2.91 BLEU over the method without rotary position embedding in eight translation directions.
引用
收藏
页码:371 / 375
页数:5
相关论文
共 50 条
  • [21] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
    Oh, Yoo Rhee
    Park, Kiyoung
    Park, Jeon Gue
    ETRI JOURNAL, 2022, 44 (03) : 476 - 490
  • [22] MULTILINGUAL END-TO-END SPEECH TRANSLATION
    Inaguma, Hirofumi
    Duh, Kevin
    Kawahara, Tatsuya
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
  • [23] Optimally Encoding Inductive Biases into the Transformer Improves End-to-End Speech Translation
    Vyas, Piyush
    Kuznetsova, Anastasia
    Williamson, Donald S.
    INTERSPEECH 2021, 2021, : 2287 - 2291
  • [24] SymFormer: End-to-End Symbolic Regression Using Transformer-Based Architecture
    Vastl, Martin
    Kulhanek, Jonas
    Kubalik, Jiri
    Derner, Erik
    Babuska, Robert
    IEEE ACCESS, 2024, 12 : 37840 - 37849
  • [25] End-to-end information fusion method for transformer-based stereo matching
    Xu, Zhenghui
    Wang, Jingxue
    Guo, Jun
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (06)
  • [26] Semantic Mask for Transformer based End-to-End Speech Recognition
    Wang, Chengyi
    Wu, Yu
    Du, Yujiao
    Li, Jinyu
    Liu, Shujie
    Lu, Liang
    Ren, Shuo
    Ye, Guoli
    Zhao, Sheng
    Zhou, Ming
    INTERSPEECH 2020, 2020, : 971 - 975
  • [27] End to end transformer-based contextual speech recognition based on pointer network
    Lin, Binghuai
    Wang, Liyuan
    INTERSPEECH 2021, 2021, : 2087 - 2091
  • [28] End-to-End Speech Translation for Code Switched Speech
    Weller, Orion
    Sperber, Matthias
    Pires, Telmo
    Setiawan, Hendra
    Gollan, Christian
    Telaar, Dominic
    Paulik, Matthias
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
  • [29] RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer
    Zeng, Xingshan
    Li, Liangyou
    Liu, Qun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2461 - 2474
  • [30] Transformer-based end-to-end attack on text CAPTCHAs with triplet deep attention
    Zhang, Bo
    Xiong, Yu-Jie
    Xia, Chunming
    Gao, Yongbin
    COMPUTERS & SECURITY, 2024, 146