Analysis of Acoustic information in End-to-End Spoken Language Translation

被引：0

作者：

Sant, Gerard ^{[1
,2
,3
]}

Escolano, Carlos ^{[1
]}

机构：

[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain

[2] Barcelona Supercomp Ctr, Barcelona, Spain

[3] UPC, Barcelona, Spain

来源：

INTERSPEECH 2023 | 2023年

关键词：

Spoken Language Translation; Interpretability of Acoustic information;

D O I：

10.21437/Interspeech.2023-2050

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

End-to-End Transformer-based models are the most popular approach for Spoken Language Translation (SLT). While obtaining state-of-the-art results, we are still far from understanding how these models extract acoustic information from the data and how they are transformed into semantic representations. In this paper, we seek to provide a better understanding of the flow of acoustic information along speech-to-text translation models. By means of the Speaker Classification and Spectrogram Reconstruction tasks, this study (i) interprets the main role of the encoder with respect to the acoustic features, (ii) highlights the importance of the acoustic information throughout the model and its transfer between encoder and decoder, and (iii) reveals the significant effect of downsampling convolutional layers for learning acoustic features. (iv) Finally, we also observe the existence of a strong correlation between the semantic domain and the speakers' labels in MuST-C.

引用

页码：52 / 56

页数：5

共 50 条

[21] Integrating Dialog History into End-to-End Spoken Language Understanding Systems
Ganhotra, Jatin
Thomas, Samuel
Kuo, Hong-Kwang J.
Joshi, Sachindra
Saon, George
Tuske, Zoltan
Kingsbury, Brian
INTERSPEECH 2021, 2021, : 1254 - 1258
[22] FROM AUDIO TO SEMANTICS: APPROACHES TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Haghani, Parisa
Narayanan, Arun
Bacchiani, Michiel
Chuang, Galen
Gaur, Neeraj
Moreno, Pedro
Prabhavalkar, Rohit
Qu, Zhongdi
Waters, Austin
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 720 - 726
[23] END-TO-END ARCHITECTURES FOR ASR-FREE SPOKEN LANGUAGE UNDERSTANDING
Palogiannidi, Elisavet
Gkinis, Ioannis
Mastrapas, George
Mizera, Petr
Stafylakis, Themos
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7974 - 7978
[24] Toward Low-Cost End-to-End Spoken Language Understanding
Dinarelli, Marco
Naguib, Marco
Portet, Francois
INTERSPEECH 2022, 2022, : 2728 - 2732
[25] TOP-DOWN ATTENTION IN END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Chen, Yixin
Lu, Weiyi
Mottini, Alejandro
Li, Li Erran
Droppo, Jasha
Du, Zheng
Zeng, Belinda
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6199 - 6203
[26] Low resource end-to-end spoken language understanding with capsule networks
Poncelet, Jakob
Renkens, Vincent
Van hamme, Hugo
COMPUTER SPEECH AND LANGUAGE, 2021, 66
[27] Recent progress in deep end-to-end models for spoken language processing
Audhkhasi, K.
Rosenberg, A.
Saon, G.
Sethy, A.
Ramabhadran, B.
Chen, S.
Picheny, M.
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2017, 61 (4-5)
[28] SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
Qian, Yao
Bianv, Ximo
Shi, Yu
Kanda, Naoyuki
Shen, Leo
Xiao, Zhen
Zeng, Michael
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7458 - 7462
[29] Gloss-Free End-to-End Sign Language Translation
Lin, Kezhou
Wang, Xiaohan
Zhu, Linchao
Sun, Ke
Zhang, Bang
Yang, Yi
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 12904 - 12916
[30] Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
Kim, Suyoun
Shrivastava, Akshat
Duc Le
Lin, Ju
Kalinli, Ozlem
Seltzer, Michael L.
INTERSPEECH 2023, 2023, : 1119 - 1123

← 1 2 3 4 5 →