AVSE CHALLENGE: AUDIO-VISUAL SPEECH ENHANCEMENT CHALLENGE

被引：4

作者：

Blanco, Andrea Lorena Aldana ^{[1
]}

Valentini-Botinhao, Cassia ^{[1
]}

Klejch, Ondrej ^{[1
]}

Gogate, Mandar ^{[2
]}

Dashtipour, Kia ^{[2
]}

Hussain, Amir ^{[2
]}

Bell, Peter ^{[1
]}

机构：

[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland

[2] Edinburgh Napier Univ, Edinburgh, Midlothian, Scotland

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

基金：

英国工程与自然科学研究理事会;

关键词：

Audio-visual speech enhancement; subjective intelligibility; LRS3; dataset;

D O I：

10.1109/SLT54892.2023.10023284

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of improving speech intelligibility in adverse listening scenarios that are currently too challenging for audio-only speech enhancement models. The Audio-Visual Speech Enhancement (AVSE) challenge aims to set the first benchmark in this area. We provide participants with datasets and scripts to test their audio-visual speech enhancement models under a common framework for both training and evaluation. The data is derived from real-world videos, and comprises noisy mixes, in which audio from target speaker is mixed with either a competing speaker or a noise signal. The submitted systems are evaluated by conducting AV intelligibility tests involving human participants. We expect this challenge to be a platform for advancing the field of audio-visual speech-enhancement and to provide further insight about the scope and limitations of current AV speech enhancement approaches.

引用

页码：465 / 471

页数：7

共 50 条

[21] Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
Zheng, Rui-Chen
Ai, Yang
Ling, Zhen-Hua
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1430 - 1444
[22] THE IMPACT OF REMOVING HEAD MOVEMENTS ON AUDIO-VISUAL SPEECH ENHANCEMENT
Kang, Zhiqi
Sadeghi, Mostafa
Horaud, Radu
Alameda-Pineda, Xavier
Donley, Jacob
Kumar, Anurag
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7302 - 7306
[23] Application for Real-time Audio-Visual Speech Enhancement
Gogate, Mandar
Dashtipour, Kia
Hussain, Amir
INTERSPEECH 2023, 2023, : 2026 - 2027
[24] Using Twin-HMM-Based Audio-Visual Speech Enhancement as a Front-End for Robust Audio-Visual Speech Recognition
Abdelaziz, Ahmed Hussen
Zeiler, Steffen
Kolossa, Dorothea
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 867 - 871
[25] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[26] Expressive audio-visual speech
Bevacqua, E
Pelachaud, C
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304
[27] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
Huyse, Aurelie
Leybaert, Jacqueline
Berthommier, Frederic
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931
[28] Cogeneration of Innovative Audio-visual Content: A New Challenge for Computing Art
Liu, Mengting
Zhou, Ying
Wu, Yuwei
Gao, Feng
MACHINE INTELLIGENCE RESEARCH, 2024, 21 (01) : 4 - 28
[29] AUDIO-VISUAL WAKE WORD SPOTTING SYSTEM FOR MISP CHALLENGE 2021
Xu, Yanguang
Sun, Jianwei
Han, Yang
Zhao, Shuaijiang
Mei, Chaoyang
Guo, Tingwei
Zhou, Shuran
Xie, Chuandong
Zou, Wei
Li, Xiangang
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9246 - 9250
[30] Cogeneration of Innovative Audio-visual Content: A New Challenge for Computing Art
Mengting Liu
Ying Zhou
Yuwei Wu
Feng Gao
Machine Intelligence Research, 2024, 21 : 4 - 28

← 1 2 3 4 5 →