BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

被引:0
|
作者
Chen, Mingda [1 ]
Duquenne, Paul-Ambroise [1 ]
Andrews, Pierre [1 ]
Kao, Justine [1 ]
Mourachko, Alexandre [1 ]
Schwenk, Holger [1 ]
Costa-Jussa, Marta R. [1 ]
机构
[1] Meta AI, Menlo Pk, CA 94025 USA
来源
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
End-to-End speech-to-speech translation (S2ST) is generally evaluated with text-based metrics. This means that generated speech has to be automatically transcribed, making the evaluation dependent on the availability and quality of automatic speech recognition (ASR) systems. In this paper, we propose a text-free evaluation metric for end-to-end S2ST, named BLASER, to avoid the dependency on ASR systems. BLASER leverages a multilingual multimodal encoder to directly encode the speech segments for source input, translation output and reference into a shared embedding space and computes a score of the translation quality that can be used as a proxy to human evaluation. To evaluate our approach, we construct training and evaluation sets from more than 40k human annotations covering seven language directions. The best results of BLASER are achieved by training with supervision from human rating scores. We show that when evaluated at the sentence level, BLASER correlates significantly better with human judgment compared to ASRdependent metrics including ASR-SENTBLEU in all translation directions and ASR- COMET in five of them. Our analysis shows combining speech and text as inputs to BLASER does not increase the correlation with human scores, but best correlations are achieved when using speech, which motivates the goal of our research. Moreover, we show that using ASR for references is detrimental for text-based metrics.(1)
引用
收藏
页码:9064 / 9079
页数:16
相关论文
共 50 条
  • [21] AUTOMATIC PRONUNCIATION PREDICTION FOR TEXT-TO-SPEECH SYNTHESIS OF DIALECTAL ARABIC IN A SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Ananthakrishnan, Sankaranarayanan
    Tsakalidis, Stavros
    Prasad, Rohit
    Natarajan, Prem
    Vembu, Aravind Namandi
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4957 - 4960
  • [22] Finite-state speech-to-speech translation
    Vidal, E
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 111 - 114
  • [23] A speech-to-speech translation based interface for tourism
    Cettolo, M
    Corazza, A
    Lazzari, G
    Pianesi, F
    Pianta, E
    Tovena, LM
    INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 1999, 1999, : 191 - 200
  • [24] Incremental Dialog Clustering For Speech-to-Speech Translation
    Stallard, David
    Tsakalidis, Stavros
    Saleem, Shirin
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 428 - 431
  • [25] Semantic transfer in speech-to-speech machine translation
    Abb, B
    Buschbeck-Wolf, B
    Tschernitschek, C
    NATURAL LANGUAGE PROCESSING AND SPEECH TECHNOLOGY: RESULTS OF THE 3RD KONVENS CONFERENCE, 1996, : 123 - 136
  • [26] Speech-to-speech Low-resource Translation
    Liu, Hsiao-Chuan
    Day, Min-Yuh
    Wang, Chih-Chien
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 91 - 95
  • [27] The impact of ASR on speech-to-speech translation performance
    Sarikaya, Ruhi
    Zhou, Bowen
    Povey, Daniel
    Afify, Mohamed
    Gao, Yuqing
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1289 - +
  • [28] Stress Transfer in Speech-to-Speech Machine Translation
    Akarsh, Sai
    Narasinga, Vamshiraghusimha
    Vuppala, Anil Kumar
    INTERSPEECH 2024, 2024, : 995 - 996
  • [29] The ATR multilingual speech-to-speech translation system
    Nakamura, S
    Markov, K
    Nakaiwa, H
    Kikui, G
    Kawai, H
    Jitsuhiro, T
    Zhang, JS
    Yamamoto, H
    Sumita, E
    Yamamoto, S
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 365 - 376
  • [30] Applications of Language Modeling in Speech-To-Speech Translation
    Liu, Fu-Hua
    Gu, Liang
    Gao, Yuqing
    Picheny, Michael
    International Journal of Speech Technology, 2004, 7 (2-3) : 221 - 229