The Conversation: Deep Audio -Visual Speech Enhancement

被引:0
|
作者
Afouras, Triantafyllos [1 ]
Chung, Joon Son [1 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Dept Engn Sci, Visual Geometry Grp, Oxford, England
基金
英国工程与自然科学研究理事会;
关键词
speech enhancement; speech separation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos. Existing works in this area have focussed on trying to separate utterances from known speakers in controlled environments. In this paper, we propose a deep audio-visual speech enhancement network that is able to separate a speaker's voice given lip regions in the corresponding video, by predicting both the magnitude and the phase of the target signal. The method is applicable to speakers unheard and unseen during training, and for unconstrained environments. We demonstrate strong quantitative and qualitative results, isolating extremely challenging real-world examples.
引用
收藏
页码:3244 / 3248
页数:5
相关论文
共 50 条
  • [1] Audio-Visual Speech Enhancement using Deep Neural Networks
    Hou, Jen-Cheng
    Wang, Syu-Siang
    Lai, Ying-Hui
    Lin, Jen-Chun
    Tsao, Yu
    Chang, Hsiu-Wen
    Wang, Hsin-Min
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [2] CONTINUOUS VISUAL SPEECH RECOGNITION FOR AUDIO SPEECH ENHANCEMENT
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2244 - 2248
  • [3] Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Tsao, Yu
    Lo, Chen-Chou
    Wang, Hsin-Min
    INTERSPEECH 2020, 2020, : 1131 - 1135
  • [4] Audio-visual enhancement of speech in noise
    Girin, L
    Schwartz, JL
    Feng, G
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (06): : 3007 - 3020
  • [5] An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
    Michelsanti, Daniel
    Tan, Zheng-Hua
    Zhang, Shi-Xiong
    Xu, Yong
    Yu, Meng
    Yu, Dong
    Jensen, Jesper
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1368 - 1396
  • [6] Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks
    Hou, Jen-Cheng
    Wang, Syu-Siang
    Lai, Ying-Hui
    Tsao, Yu
    Chang, Hsiu-Wen
    Wang, Hsin-Min
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2018, 2 (02): : 117 - 128
  • [7] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [8] Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
    Yang, Karren
    Markovic, Dejan
    Krenn, Steven
    Agrawal, Vasu
    Richard, Alexander
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8217 - 8227
  • [9] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [10] A ROBUST AUDIO-VISUAL SPEECH ENHANCEMENT MODEL
    Wang, Wupeng
    Xing, Chao
    Wang, Dong
    Chen, Xiao
    Sun, Fengyu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7529 - 7533