AVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE

被引:4
|
作者
Blanco, Andrea Lorena Aldana [1 ]
Valentini-Botinhao, Cassia [1 ]
Klejch, Ondrej [1 ]
Gogate, Mandar [2 ]
Dashtipour, Kia [2 ]
Hussain, Amir [2 ]
Bell, Peter [1 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[2] Edinburgh Napier Univ, Edinburgh, Midlothian, Scotland
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
基金
英国工程与自然科学研究理事会;
关键词
Audio-visual speech enhancement; subjective intelligibility; LRS3; dataset;
D O I
10.1109/SLT54892.2023.10023284
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too challenging for audio-only speech enhancement models. The Audio-Visual Speech Enhancement (AVSE) challenge aims to set the first benchmark in this area. We provide participants with datasets and scripts to test their audio-visual speech enhancement models under a common framework for both training and evaluation. The data is derived from real-world videos, and comprises noisy mixes, in which audio from target speaker is mixed with either a competing speaker or a noise signal. The submitted systems are evaluated by conducting AV intelligibility tests involving human participants. We expect this challenge to be a platform for advancing the field of audio-visual speech-enhancement and to provide further insight about the scope and limitations of current AV speech enhancement approaches.
引用
收藏
页码:465 / 471
页数:7
相关论文
共 50 条
  • [41] Estimation of Ideal Binary Mask for Audio-Visual Monaural Speech Enhancement
    S. Balasubramanian
    R. Rajavel
    Asutosh Kar
    Circuits, Systems, and Signal Processing, 2023, 42 : 5313 - 5337
  • [42] AN EVALUATION OF STEREO SPEECH ENHANCEMENT METHODS FOR DIFFERENT AUDIO-VISUAL SCENARIOS
    Craciun, Alexandra
    Uhle, Christian
    Baeckstroem, Tom
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2048 - 2052
  • [43] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    Sodoyer, D. (sodoyer@icp.inpg.fr), 1600, Hindawi Publishing Corporation (2002):
  • [44] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
    David Sodoyer
    Jean-Luc Schwartz
    Laurent Girin
    Jacob Klinkisch
    Christian Jutten
    EURASIP Journal on Advances in Signal Processing, 2002
  • [45] Audio-visual speech perception is special
    Tuomainen, J
    Andersen, TS
    Tiippana, K
    Sams, M
    COGNITION, 2005, 96 (01) : B13 - B22
  • [46] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [47] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [48] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [49] Audio-Visual Speech Cue Combination
    Arnold, Derek H.
    Tear, Morgan
    Schindel, Ryan
    Roseboom, Warrick
    PLOS ONE, 2010, 5 (04):
  • [50] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +