Transformer-Based End-to-End Speech Translation With Rotary Position Embedding

被引：2

作者：

Li, Xueqing ^{[1
]}

Li, Shengqiang ^{[1
]}

Zhang, Xiao-Lei ^{[1
,2
]}

Rahardja, Susanto ^{[1
,3
]}

机构：

[1] Northwestern Polytech Univ, Sch Marine Sci & Technol, Xian 710072, Peoples R China

[2] Northwestern Polytech Univ, Res & Dev Inst, Shenzhen 710072, Peoples R China

[3] Singapore Inst Technol, Engn Cluster, Singapore 138683, Singapore

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

基金：

美国国家科学基金会;

关键词：

End-to-end speech translation; rotary position embedding; Transformer;

D O I：

10.1109/LSP.2024.3353039

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, many Transformer-based models have been applied to end-to-end speech translation because of their capability to model global dependencies. Position embedding is crucial in Transformer models as it facilitates the modeling of dependencies between elements at various positions within the input sequence. Most position embedding methods employed in speech translation such as the absolute and relative position embedding, often encounter challenges in leveraging relative positional information or adding computational burden to the model. In this letter, we introduce a novel approach by incorporating rotary position embedding into Transformer-based speech translation (RoPE-ST). RoPE-ST first adds absolute position information by multiplying the input vector with rotation matrices, and then implements relative position embedding through the dot-product of the self-attention mechanism. The main advantage of the proposed method over the original method is that rotary position embedding combines the benefits of absolute and relative position embedding, which is suited for position embedding in speech translation tasks. We conduct experiments on a multilingual speech translation corpus MuST-C. Results show that RoPE-ST achieves an average improvement of 2.91 BLEU over the method without rotary position embedding in eight translation directions.

引用

页码：371 / 375

页数：5

共 50 条

[21] Fast offline transformer-based end-to-end automatic speech recognition for real-world applications
Oh, Yoo Rhee
Park, Kiyoung
Park, Jeon Gue
ETRI JOURNAL, 2022, 44 (03) : 476 - 490
[22] MULTILINGUAL END-TO-END SPEECH TRANSLATION
Inaguma, Hirofumi
Duh, Kevin
Kawahara, Tatsuya
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 570 - 577
[23] Optimally Encoding Inductive Biases into the Transformer Improves End-to-End Speech Translation
Vyas, Piyush
Kuznetsova, Anastasia
Williamson, Donald S.
INTERSPEECH 2021, 2021, : 2287 - 2291
[24] SymFormer: End-to-End Symbolic Regression Using Transformer-Based Architecture
Vastl, Martin
Kulhanek, Jonas
Kubalik, Jiri
Derner, Erik
Babuska, Robert
IEEE ACCESS, 2024, 12 : 37840 - 37849
[25] End-to-end information fusion method for transformer-based stereo matching
Xu, Zhenghui
Wang, Jingxue
Guo, Jun
MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (06)
[26] Semantic Mask for Transformer based End-to-End Speech Recognition
Wang, Chengyi
Wu, Yu
Du, Yujiao
Li, Jinyu
Liu, Shujie
Lu, Liang
Ren, Shuo
Ye, Guoli
Zhao, Sheng
Zhou, Ming
INTERSPEECH 2020, 2020, : 971 - 975
[27] End to end transformer-based contextual speech recognition based on pointer network
Lin, Binghuai
Wang, Liyuan
INTERSPEECH 2021, 2021, : 2087 - 2091
[28] End-to-End Speech Translation for Code Switched Speech
Weller, Orion
Sperber, Matthias
Pires, Telmo
Setiawan, Hendra
Gollan, Christian
Telaar, Dominic
Paulik, Matthias
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1435 - 1448
[29] RealTranS: End-to-End Simultaneous Speech Translation with Convolutional Weighted-Shrinking Transformer
Zeng, Xingshan
Li, Liangyou
Liu, Qun
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2461 - 2474
[30] Transformer-based end-to-end attack on text CAPTCHAs with triplet deep attention
Zhang, Bo
Xiong, Yu-Jie
Xia, Chunming
Gao, Yongbin
COMPUTERS & SECURITY, 2024, 146

← 1 2 3 4 5 →