AVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE

被引：4

作者：

Blanco, Andrea Lorena Aldana ^{[1
]}

Valentini-Botinhao, Cassia ^{[1
]}

Klejch, Ondrej ^{[1
]}

Gogate, Mandar ^{[2
]}

Dashtipour, Kia ^{[2
]}

Hussain, Amir ^{[2
]}

Bell, Peter ^{[1
]}

机构：

[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[2] Edinburgh Napier Univ, Edinburgh, Midlothian, Scotland

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

基金：

英国工程与自然科学研究理事会;

关键词：

Audio-visual speech enhancement; subjective intelligibility; LRS3; dataset;

D O I：

10.1109/SLT54892.2023.10023284

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too challenging for audio-only speech enhancement models. The Audio-Visual Speech Enhancement (AVSE) challenge aims to set the first benchmark in this area. We provide participants with datasets and scripts to test their audio-visual speech enhancement models under a common framework for both training and evaluation. The data is derived from real-world videos, and comprises noisy mixes, in which audio from target speaker is mixed with either a competing speaker or a noise signal. The submitted systems are evaluated by conducting AV intelligibility tests involving human participants. We expect this challenge to be a platform for advancing the field of audio-visual speech-enhancement and to provide further insight about the scope and limitations of current AV speech enhancement approaches.

引用

页码：465 / 471

页数：7

共 50 条

[1] Lite Audio-Visual Speech Enhancement
Chuang, Shang-Yi
Tsao, Yu
Lo, Chen-Chou
Wang, Hsin-Min
INTERSPEECH 2020, 2020, : 1131 - 1135
[2] Audio-visual enhancement of speech in noise
Girin, L
Schwartz, JL
Feng, G
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06): : 3007 - 3020
[3] Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Yang, Karren
Markovic, Dejan
Krenn, Steven
Agrawal, Vasu
Richard, Alexander
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8217 - 8227
[4] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
APPLIED ACOUSTICS, 2023, 211
[5] Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
Deligne, S
Potamianos, G
Neti, C
SAM2002: IEEE SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP PROCEEDINGS, 2002, : 68 - 71
[6] Improved Lite Audio-Visual Speech Enhancement
Chuang, Shang-Yi
Wang, Hsin-Min
Tsao, Yu
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1345 - 1359
[7] A ROBUST AUDIO-VISUAL SPEECH ENHANCEMENT MODEL
Wang, Wupeng
Xing, Chao
Wang, Dong
Chen, Xiao
Sun, Fengyu
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7529 - 7533
[8] Improved Lite Audio-Visual Speech Enhancement
Chuang, Shang-Yi
Wang, Hsin-Min
Tsao, Yu
IEEE/ACM Transactions on Audio Speech and Language Processing, 2022, 30 : 1345 - 1359
[9] Audio-Visual Emotion Challenge 2012: A Simple Approach
van der Maaten, Laurens
ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2012, : 473 - 476
[10] Edged based Audio-Visual Speech enhancement demonstrator
Chen, Song
Gogate, Mandar
Dashtipour, Kia
Kirton-Wingate, Jasper
Hussain, Adeel
Doctor, Faiyaz
Arslan, Tughrul
Hussain, Amir
INTERSPEECH 2024, 2024, : 2032 - 2033

← 1 2 3 4 5 →