An Angle-Oriented Approach to Transferring Speech to Gesture for Highly Anthropomorphized Embodied Conversational Agents

被引:0
|
作者
Liu, Jiwei [1 ]
Qin, Zheng [1 ]
Zhang, Zixing [1 ]
Zhang, Jixin [2 ]
机构
[1] Hunan Univ, Coll Comp Sci & Elect Engn, Changsha 410082, Peoples R China
[2] Hubei Univ Technol, Sch Comp Sci, Rd Nanli, Wuhan 430068, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech to gesture; co-speech gesture; embodied conversational agent; anthropomorphization; ELIZA;
D O I
10.1142/S1469026824500068
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Realistic co-speech gestures are important to anthropomorphize ECAs, as nonverbal behavior improves expressiveness of their speech greatly. However, the existing approaches to generating co-speech gestures with sufficient details (including fingers, etc.) in 3D scenarios are indeed rare. Additionally, they hardly address the problem of abnormal gestures, temporal-spatial coherence and diversity of gesture sequences comprehensively. To handle abnormal gesture issues, we put forward an angle conversion method to remove body part length from the original in-the-wild video dataset via transferring coordinates of human upper body key points into relative deflection angles and pitch angles. We also propose a neural network called HARP with encoder-decoder architecture to transfer MFCC featured speech audio into aforementioned angles on the basis of CNN and LSTM. The angles then can be rendered as corresponding co-speech gestures. Compared with the other latest approaches, the co-speech gestures generated by HARP are proved to be almost as good as the real person, i.e., they have strong temporal-spatial coherence, diversity, persuasiveness and credibility. Our approach puts finer control on co-speech gestures than most of the existing works by handling all key points of the human upper body. It is more feasible for industrial application, since HARP can be adaptive to any human upper body model. All related code and evidence videos of HARP can be accessed at https://github.com/drrobincroft/HARP.
引用
收藏
页数:22
相关论文
共 20 条
  • [1] Workshop on Speech and Gesture Production in Virtually and Physically Embodied Conversational Agents
    Mead, Ross
    Salem, Maha
    ICMI '12: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2012, : 607 - 608
  • [2] THE ROLE OF GESTURE IN DOCUMENT EXPLANATION BY EMBODIED CONVERSATIONAL AGENTS
    Bickmore, Timothy
    Pfeifer, Laura
    Yin, Langxuan
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2008, 2 (01) : 47 - 70
  • [3] Implementing expressive gesture synthesis for embodied conversational agents
    Hartmann, Bjorn
    Mancini, Maurizio
    Pelachaud, Catherine
    GESTURE IN HUMAN-COMPUTER INTERACTION AND SIMULATION, 2006, 3881 : 188 - 199
  • [4] A Review of Evaluation Practices of Gesture Generation in Embodied Conversational Agents
    Wolfert, Pieter
    Robinson, Nicole
    Belpaeme, Tony
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2022, 52 (03) : 379 - 389
  • [5] Influence of Directivity on the Perception of Embodied Conversational Agents' Speech
    Wendt, Jonathan
    Weyers, Benjamin
    Stienen, Jonas
    Boensch, Andrea
    Vorlaender, Michael
    Kuhlen, Torsten W.
    PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA' 19), 2019, : 130 - 132
  • [6] Speech-Gesture GAN: Gesture Generation for Robots and Embodied Agents
    Liu, Carson Yu
    Mohammadi, Gelareh
    Song, Yang
    Johal, Wafa
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 405 - 412
  • [7] Automatic text-to-gesture rule generation for embodied conversational agents
    Ali, Ghazanfar
    Lee, Myungho
    Hwang, Jae-In
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [8] Recorded Speech, Virtual Environments, and the Effectiveness of Embodied Conversational Agents
    Gris, Ivan
    Novick, David
    Camacho, Adriana
    Rivera, Diego A.
    Gutierrez, Mario
    Rayon, Alex
    INTELLIGENT VIRTUAL AGENTS, IVA 2014, 2014, 8637 : 182 - 185
  • [9] The effect of embodied conversational agents' speech quality on users' attention and emotion
    Chateau, N
    Maffiolo, V
    Pican, N
    Mersiol, M
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 652 - 659
  • [10] Audio based real-time speech animation of embodied conversational agents
    Malcangi, M
    de Tintis, R
    GESTURE-BASED COMMUNICATION IN HUMAN-COMPUTER INTERACTION, 2003, 2915 : 350 - 360