Accurate Visual Speech Synthesis Based on Diviseme Unit Selection and Concatenation

被引:0
|
作者
Jiang, Dongmei [1 ]
Ravyse, Ilse [2 ]
Sahli, Hichem [2 ]
Zhang, Yanning [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Joint Res Grp Audio Visual Signal Proc, 127 Youyi Xilu, Xian 710072, Peoples R China
[2] Vrije Univ Brussel, Dept ETRO, B-1050 Brussels, Belgium
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel speech driven accurate realistic visual speech synthesis approach. Firstly, an audio visual instance database is built for different viseme context combinations, i.e. diviseme units, using 100 audio visual speech sentences of a female speaker. Then a diviseme instance selection algorithm is introduced to choose the optimal diviseme instances for the viseme contexts in the input speech, considering both the concatenation smoothness of the image sequences, and matching of the mouth movements to the acoustic pronunciation process, as well the intensity of the input speech. Finally mouth image sequences of corresponding viseme segments in the selected diviseme instances are time warped and blended to construct the mouth images of the final animation. Visual speech synthesis experiments and subjective evaluation results show that mouth animations can he obtained which are not only realistic with clear and smooth mouth images, but also in good accordance with the acoustic pronunciation and intensity of the input speech.
引用
收藏
页码:910 / +
页数:2
相关论文
共 50 条
  • [31] Using Concatenation Cost for Unit Selection of Homosonic Segments in Concatenative Sound Synthesis
    Norowi, Noris Mohd.
    Mustaffa, Mas Rina
    Miranda, Eduardo Reck
    2016 THIRD INTERNATIONAL CONFERENCE ON INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT (CAMP), 2016, : 37 - 42
  • [32] CONCATENATION RULES FOR DEMISYLLABLE SPEECH SYNTHESIS
    DETTWEILER, H
    HESS, W
    ACUSTICA, 1985, 57 (4-5): : 268 - 283
  • [33] Automatic selection of visemes for image-based visual speech synthesis
    Yang, J
    Xiao, J
    Ritter, M
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1081 - 1084
  • [34] Optimal Utterance Selection for Unit Selection Speech Synthesis Databases
    Alan W. Black
    Kevin Lenzo
    International Journal of Speech Technology, 2003, 6 (4) : 357 - 363
  • [35] Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
    Zhou, Xiao
    Ling, Zhen-Hua
    Zhou, Zhi-Ping
    Dai, Li-Rong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2509 - 2513
  • [36] Scalable concatenative speech synthesis based on the plural unit selection and fusion method
    Tamura, M
    Mizutani, T
    Kagoshima, T
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 361 - 364
  • [37] Diphone-based unit selection for Catalan text-to-speech synthesis
    Guaus, R
    Iriondo, I
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2000, 1902 : 277 - 282
  • [38] Unit-Selection Speech Synthesis Adjustments for Audiobook-Based Voices
    Vit, Jakub
    Matousek, Jindrich
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 335 - 342
  • [39] Training of Coarticulation Models using Dominance Functions and Visual Unit Selection Methods for Audio-Visual Speech Synthesis
    Krnoul, Zdenek
    Zelezny, Milos
    Mueller, Ludek
    Kanis, Jakub
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 585 - 588
  • [40] Recording and annotation of speech corpus for Czech unit selection speech synthesis
    Matousek, Jindrich
    Romportl, Jan
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 326 - +