On the Importance of Audiovisual Coherence for the Perceived Quality of Synthesized Visual Speech

被引:14
|
作者
Mattheyses, Wesley [1 ]
Latacz, Lukas [1 ]
Verhelst, Werner [1 ]
机构
[1] Vrije Univ Brussel, Interdisciplinary Inst Broadband Technol IBBT, Dept ETRO DSSP, B-1050 Brussels, Belgium
关键词
SYNTHETIC TALKING FACES;
D O I
10.1155/2009/169819
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audiovisual text-to-speech systems convert a written text into an audiovisual speech signal. Typically, the visual mode of the synthetic speech is synthesized separately from the audio, the latter being either natural or synthesized speech. However, the perception of mismatches between these two information streams requires experimental exploration since it could degrade the quality of the output. In order to increase the intermodal coherence in synthetic 2D photorealistic speech, we extended the well-known unit selection audio synthesis technique to work with multimodal segments containing original combinations of audio and video. Subjective experiments confirm that the audiovisual signals created by our multimodal synthesis strategy are indeed perceived as being more synchronous than those of systems in which both modes are not intrinsically coherent. Furthermore, it is shown that the degree of coherence between the auditory mode and the visual mode has an influence on the perceived quality of the synthetic visual speech fragment. In addition, the audio quality was found to have only a minor influence on the perceived visual signal's quality. Copyright (C) 2009 Wesley Mattheyses et al.
引用
收藏
页数:12
相关论文
共 50 条
  • [11] COMPARATIVE ANALYSIS OF AUDIOVISUAL, AUDITIVE AND VISUAL PERCEPTION OF SPEECH
    EWERTSEN, HW
    NIELSEN, HB
    ACTA OTO-LARYNGOLOGICA, 1971, 72 (03) : 201 - &
  • [12] The role of visual spatial attention in audiovisual speech perception
    Andersen, Tobias S.
    Tiippana, Kaisa
    Laarni, Jari
    Kojo, Ilpo
    Sams, Mikko
    SPEECH COMMUNICATION, 2009, 51 (02) : 184 - 193
  • [13] Detection of Audiovisual Speech Correspondences Without Visual Awareness
    Alsius, Agnes
    Munhall, Kevin G.
    PSYCHOLOGICAL SCIENCE, 2013, 24 (04) : 423 - 431
  • [14] The contribution of dynamic visual cues to audiovisual speech perception
    Jaekl, Philip
    Pesquita, Ana
    Alsius, Agnes
    Munhall, Kevin
    Soto-Faraco, Salvador
    NEUROPSYCHOLOGIA, 2015, 75 : 402 - 410
  • [15] On the robustness of audiovisual liveness detection to visual speech animation
    Komulainen, Jukka
    Anina, Iryna
    Holappa, Jukka
    Boutellaa, Elhocine
    Hadid, Abdenour
    2016 IEEE 8TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS), 2016,
  • [16] Visual and Auditory Components in the Perception of Asynchronous Audiovisual Speech
    Garcia-Perez, Miguel A.
    Alcala-Quintana, Rocio
    I-PERCEPTION, 2015, 6 (06): : 1 - 20
  • [17] An audiovisual test of kinematic primitives for visual speech perception
    Rosenblum, LD
    Saldana, HM
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1996, 22 (02) : 318 - 331
  • [18] Mismatch Negativity with Visual-only and Audiovisual Speech
    Ponton, Curtis W.
    Bernstein, Lynne E.
    Auer, Edward T., Jr.
    BRAIN TOPOGRAPHY, 2009, 21 (3-4) : 207 - 215
  • [19] Distraction of visual attention reduces integration of audiovisual speech
    Tiippana, K.
    Sams, M.
    PERCEPTION, 2000, 29 : 22 - 22
  • [20] Produced quality is not perceived quality -: A qualitative approach to overall audiovisual quality
    Jumisko-Pyykkoe, S.
    Reiter, U.
    Weigel, Chr
    2007 3DTV CONFERENCE, 2007, : 193 - +