AVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE

被引:4
|
作者
Blanco, Andrea Lorena Aldana [1 ]
Valentini-Botinhao, Cassia [1 ]
Klejch, Ondrej [1 ]
Gogate, Mandar [2 ]
Dashtipour, Kia [2 ]
Hussain, Amir [2 ]
Bell, Peter [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[2] Edinburgh Napier Univ, Edinburgh, Midlothian, Scotland
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
基金
英国工程与自然科学研究理事会;
关键词
Audio-visual speech enhancement; subjective intelligibility; LRS3; dataset;
D O I
10.1109/SLT54892.2023.10023284
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too challenging for audio-only speech enhancement models. The Audio-Visual Speech Enhancement (AVSE) challenge aims to set the first benchmark in this area. We provide participants with datasets and scripts to test their audio-visual speech enhancement models under a common framework for both training and evaluation. The data is derived from real-world videos, and comprises noisy mixes, in which audio from target speaker is mixed with either a competing speaker or a noise signal. The submitted systems are evaluated by conducting AV intelligibility tests involving human participants. We expect this challenge to be a platform for advancing the field of audio-visual speech-enhancement and to provide further insight about the scope and limitations of current AV speech enhancement approaches.
引用
收藏
页码:465 / 471
页数:7
相关论文
共 50 条
  • [31] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [32] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [33] DISENTANGLEMENT LEARNING FOR VARIATIONAL AUTOENCODERS APPLIED TO AUDIO-VISUAL SPEECH ENHANCEMENT
    Carbajal, Guillaume
    Richter, Julius
    Gerkmann, Timo
    2021 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2021, : 126 - 130
  • [34] Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement
    Zeiler, Steffen
    Meutzner, Hendrik
    Abdelaziz, Ahmed Hussen
    Kolossa, Dorothea
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1750 - 1754
  • [35] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    Sodoyer, D
    Schwartz, JL
    Girin, L
    Klinkisch, J
    Jutten, C
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1165 - 1173
  • [36] Estimation of Ideal Binary Mask for Audio-Visual Monaural Speech Enhancement
    Balasubramanian, S.
    Rajavel, R.
    Kar, Asutosh
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 42 (09) : 5313 - 5337
  • [37] Audio-Visual Speech Enhancement using Hierarchical Extreme Learning Machine
    Hussain, Tassadaq
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Jia-Ching
    Siniscalchi, Sabato Marco
    Liao, Wen-Hung
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [38] My lips are concealed: Audio-visual speech enhancement through obstructions
    Afouras, Triantafyllos
    Chung, Joon Son
    Zisserman, Andrew
    INTERSPEECH 2019, 2019, : 4295 - 4299
  • [39] DYNAMIC AUDIO-VISUAL SPEECH ENHANCEMENT USING RECURRENT VARIATIONAL AUTOENCODERS
    Foroushi, Z.
    Dansereau, R. M.
    2024 18TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT, IWAENC 2024, 2024, : 60 - 64
  • [40] Audio-Visual Speech Enhancement Based on Multiscale Features and Parallel Attention
    Jia, Shifan
    Zhang, Xinman
    Han, Weiqi
    2024 23RD INTERNATIONAL SYMPOSIUM INFOTEH-JAHORINA, INFOTEH, 2024,