Analysis of Acoustic information in End-to-End Spoken Language Translation

被引:0
|
作者
Sant, Gerard [1 ,2 ,3 ]
Escolano, Carlos [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
[2] Barcelona Supercomp Ctr, Barcelona, Spain
[3] UPC, Barcelona, Spain
来源
关键词
Spoken Language Translation; Interpretability of Acoustic information;
D O I
10.21437/Interspeech.2023-2050
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-End Transformer-based models are the most popular approach for Spoken Language Translation (SLT). While obtaining state-of-the-art results, we are still far from understanding how these models extract acoustic information from the data and how they are transformed into semantic representations. In this paper, we seek to provide a better understanding of the flow of acoustic information along speech-to-text translation models. By means of the Speaker Classification and Spectrogram Reconstruction tasks, this study (i) interprets the main role of the encoder with respect to the acoustic features, (ii) highlights the importance of the acoustic information throughout the model and its transfer between encoder and decoder, and (iii) reveals the significant effect of downsampling convolutional layers for learning acoustic features. (iv) Finally, we also observe the existence of a strong correlation between the semantic domain and the speakers' labels in MuST-C.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 50 条
  • [21] Integrating Dialog History into End-to-End Spoken Language Understanding Systems
    Ganhotra, Jatin
    Thomas, Samuel
    Kuo, Hong-Kwang J.
    Joshi, Sachindra
    Saon, George
    Tuske, Zoltan
    Kingsbury, Brian
    INTERSPEECH 2021, 2021, : 1254 - 1258
  • [22] FROM AUDIO TO SEMANTICS: APPROACHES TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Haghani, Parisa
    Narayanan, Arun
    Bacchiani, Michiel
    Chuang, Galen
    Gaur, Neeraj
    Moreno, Pedro
    Prabhavalkar, Rohit
    Qu, Zhongdi
    Waters, Austin
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 720 - 726
  • [23] END-TO-END ARCHITECTURES FOR ASR-FREE SPOKEN LANGUAGE UNDERSTANDING
    Palogiannidi, Elisavet
    Gkinis, Ioannis
    Mastrapas, George
    Mizera, Petr
    Stafylakis, Themos
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7974 - 7978
  • [24] Toward Low-Cost End-to-End Spoken Language Understanding
    Dinarelli, Marco
    Naguib, Marco
    Portet, Francois
    INTERSPEECH 2022, 2022, : 2728 - 2732
  • [25] TOP-DOWN ATTENTION IN END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Chen, Yixin
    Lu, Weiyi
    Mottini, Alejandro
    Li, Li Erran
    Droppo, Jasha
    Du, Zheng
    Zeng, Belinda
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6199 - 6203
  • [26] Low resource end-to-end spoken language understanding with capsule networks
    Poncelet, Jakob
    Renkens, Vincent
    Van hamme, Hugo
    COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [27] Recent progress in deep end-to-end models for spoken language processing
    Audhkhasi, K.
    Rosenberg, A.
    Saon, G.
    Sethy, A.
    Ramabhadran, B.
    Chen, S.
    Picheny, M.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2017, 61 (4-5)
  • [28] SPEECH-LANGUAGE PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Qian, Yao
    Bianv, Ximo
    Shi, Yu
    Kanda, Naoyuki
    Shen, Leo
    Xiao, Zhen
    Zeng, Michael
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7458 - 7462
  • [29] Gloss-Free End-to-End Sign Language Translation
    Lin, Kezhou
    Wang, Xiaohan
    Zhu, Linchao
    Sun, Ke
    Zhang, Bang
    Yang, Yi
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 12904 - 12916
  • [30] Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
    Kim, Suyoun
    Shrivastava, Akshat
    Duc Le
    Lin, Ju
    Kalinli, Ozlem
    Seltzer, Michael L.
    INTERSPEECH 2023, 2023, : 1119 - 1123