Sample-based synthesis of photo-realistic talking heads

被引:47
|
作者
Cosatto, E [1 ]
Graf, HP [1 ]
机构
[1] AT&T Bell Labs, Res, Red Bank, NJ 07701 USA
关键词
talking-head synthesis; sample-based synthesis; photo-realistic rendering; face recognition and location; sample-based coarticulation;
D O I
10.1109/CA.1998.681914
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a system that generates photo-realistic video animations of talking heads. First the system derives head models from existing video footage using image recognition techniques. It locates, extracts and labels facial parts such as mouth, eyes, and eyebrows into a compact library. Then, using these face models and a text-to-speech synthesizer, it synthesizes new video sequences of the head where the lips are in synchrony with the accompanying soundtrack. Emotional cues and conversational signals are produced by combining head movements, raising eyebrows, wide open eyes, etc. with the mouth animation. For these animations to be believable, care has to be taken aligning the facial parts so that they blend smoothly into each other and produce seamless animations. Our system uses precise multi-channel facial recognition techniques to track facial parts, and it derives the exact 3D position of the head, enabling the automatic extraction of normalized face parts. Such talking-head animations are useful because they generally increase intelligibility of the human-machine interface in applications where content needs to be narrated to the user, such as educative software.
引用
收藏
页码:103 / 110
页数:8
相关论文
共 50 条
  • [1] Sample-based synthesis of talking heads
    Graf, HP
    Cosatto, E
    IEEE ICCV WORKSHOP ON RECOGNITION, ANALYSIS AND TRACKING OF FACES AND GESTURES IN REAL-TIME SYSTEMS, PROCEEDINGS, 2001, : 3 - 7
  • [2] Photo-Realistic Talking-Heads from Image Samples
    Cosatto, Eric
    Graf, Hans Peter
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 152 - 163
  • [3] Audio-visual unit selection for the synthesis of photo-realistic talking-heads
    Cosatto, E
    Potamianos, G
    Graf, HP
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 619 - 622
  • [4] Photo-Realistic Expressive Text to Talking Head Synthesis
    Wan, Vincent
    Anderson, Robert
    Blokland, Art
    Braunschweiler, Norbert
    Chen, Langzhou
    Kolluru, BalaKrishna
    Latorre, Javier
    Maia, Ranniery
    Stenger, Bjoern
    Yanagisawa, Kayoko
    Stylianou, Yannis
    Akamine, Masami
    Gales, Mark J. F.
    Cipolla, Roberto
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2666 - 2668
  • [5] HMM trajectory-guided sample selection for photo-realistic talking head
    Wang, Lijuan
    Soong, Frank K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9849 - 9869
  • [6] HMM trajectory-guided sample selection for photo-realistic talking head
    Lijuan Wang
    Frank K. Soong
    Multimedia Tools and Applications, 2015, 74 : 9849 - 9869
  • [7] Photo-realistic facial expression synthesis
    Ghent, J
    McDonald, J
    IMAGE AND VISION COMPUTING, 2005, 23 (12) : 1041 - 1050
  • [8] Text Driven 3D Photo-Realistic Talking Head
    Wang, Lijuan
    Han, Wei
    Soong, Frank K.
    Huo, Qiang
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3314 - 3315
  • [9] PAINT: Photo-realistic Fashion Design Synthesis
    Gu, Xiaoling
    Huang, Jie
    Wong, Yongkang
    Yu, Jun
    Fan, Jianping
    Peng, Pai
    Kankanhalli, Mohan S.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (02)
  • [10] Image-based photo hulls for fast and photo-realistic new view synthesis
    Slabaugh, GG
    Schafer, RW
    Hans, MC
    REAL-TIME IMAGING, 2003, 9 (05) : 347 - 360