On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

被引:14
|
作者
Mattheyses, Wesley [1 ]
Latacz, Lukas [1 ]
Verhelst, Werner [1 ]
机构
[1] Vrije Univ Brussel, Interdisciplinary Inst Broadband Technol IBBT, Dept ETRO DSSP, B-1050 Brussels, Belgium
关键词
SYNTHETIC TALKING FACES;
D O I
10.1155/2009/169819
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality. Copyright (C) 2009 Wesley Mattheyses et al.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech
    Wesley Mattheyses
    Lukas Latacz
    Werner Verhelst
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [2] THE COHERENCE OF SPEECH IN AUDIOVISUAL INTEGRATION
    ABRY, C
    CATHIARD, MA
    ROBERTRIBES, J
    SCHWARTZ, JL
    CAHIERS DE PSYCHOLOGIE COGNITIVE-CURRENT PSYCHOLOGY OF COGNITION, 1994, 13 (01): : 52 - 59
  • [3] Auditory, visual and audiovisual clear speech
    Gagné, JP
    Rochette, AJ
    Charest, M
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 213 - 230
  • [4] fMRI imaging of visual and audiovisual speech
    Calvert, G
    Woodruff, P
    Wright, I
    Bullmore, E
    Brammer, M
    Williams, S
    Maguire, P
    Campbell, R
    Howard, R
    Simmons, A
    David, A
    INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 1997, 25 (01) : 23 - 23
  • [5] Auditory, visual, and audiovisual clear speech effects
    Gagne, JP
    Rochette, AJ
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 4737 - 4737
  • [6] ON THE ROLE OF VISUAL CUES IN AUDIOVISUAL SPEECH ENHANCEMENT
    Aldeneh, Zakaria
    Kumar, Anushree Prasanna
    Theobald, Barry-John
    Marchi, Erik
    Kajarekar, Sachin
    Naik, Devang
    Abdelaziz, Ahmed Hussen
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8423 - 8427
  • [7] Visual attention modulates audiovisual speech perception
    Tiippana, K
    Andersen, TS
    Sams, M
    EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 2004, 16 (03): : 457 - 472
  • [8] Effects of distance on visual and audiovisual speech recognition
    Jordan, TR
    Sergeant, P
    LANGUAGE AND SPEECH, 2000, 43 : 107 - 124
  • [9] Visual Hearing Aids: Artificial Visual Speech Stimuli for Audiovisual Speech Perception in Noise
    Choudhary, Zubin Datta
    Bruder, Gerd
    Welch, Gregory F.
    29TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY, VRST 2023, 2023,
  • [10] Mismatch Negativity with Visual-only and Audiovisual Speech
    Curtis W. Ponton
    Lynne E. Bernstein
    Edward T. Auer
    Brain Topography, 2009, 21 : 207 - 215