Cultural Self-Adaptive Multimodal Gesture Generation Based on Multiple Culture Gesture Dataset

被引:1
|
作者
Wu, Jingyu [1 ]
Chen, Shi [2 ]
Gan, Shuyu [1 ]
Li, Weijun [1 ]
Yang, Changyuan [1 ]
Sun, Lingyun [2 ]
机构
[1] Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[2] Zhejiang Singapore Innovat & AI Joint Res Lab, Hangzhou, Zhejiang, Peoples R China
关键词
co-speech gesture generation; datasets; multimodal chatbots; evaluation metric; nonverbal behavior; SPEECH; LANGUAGE;
D O I
10.1145/3581783.3611705
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-speech gesture generation is essential for multimodal chatbots and agents. Previous research extensively studies the relationship between text, audio, and gesture. Meanwhile, to enhance cross-culture communication, culture-specific gestures are crucial for chatbots to learn cultural differences and incorporate cultural cues. However, culture-specific gesture generation faces two challenges: lack of large-scale, high-quality gesture datasets that include diverse cultural groups, and lack of generalization across different cultures. Therefore, in this paper, we first introduce a Multiple Culture Gesture Dataset (MCGD), the largest freely available gesture dataset to date. It consists of ten different cultures, over 200 speakers, and 10,000 segmented sequences. We further propose a Cultural Self-adaptive Gesture Generation Network (CSGN) that takes multimodal relationships into consideration while generating gestures using a cascade architecture and learnable dynamic weight. The CSGN adaptively generates gestures with different cultural characteristics without the need to retrain a new network. It extracts cultural features from the multimodal inputs or a cultural style embedding space with a designated culture. We broadly evaluate our method across four large-scale benchmark datasets. Empirical results show that our method achieves multiple cultural gesture generation and improves comprehensiveness of multimodal inputs. Our method improves the state-of-the-art average FGD from 53.7 to 48.0 and culture deception rate (CDR) from 33.63% to 39.87%.
引用
收藏
页码:3538 / 3549
页数:12
相关论文
共 50 条
  • [31] Self-adaptive attention fusion for multimodal aspect-based sentiment analysis
    Wang, Ziyue
    Guo, Junjun
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 1305 - 1320
  • [32] Self-adaptive randomized and rank-based differential evolution for multimodal problems
    Urfalioglu, Onay
    Arikan, Orhan
    JOURNAL OF GLOBAL OPTIMIZATION, 2011, 51 (04) : 607 - 640
  • [33] Self-adaptive randomized and rank-based differential evolution for multimodal problems
    Onay Urfalioglu
    Orhan Arikan
    Journal of Global Optimization, 2011, 51 : 607 - 640
  • [34] A grid self-adaptive exploration-based algorithm for multimodal multiobjective optimization
    Zou, Juan
    Yang, Xinjie
    Deng, Qi
    Liu, N.
    Xia, Yizhang
    Wu, Zeping
    APPLIED SOFT COMPUTING, 2024, 166
  • [35] M.Gesture: An Acceleration-Based Gesture Authoring System on Multiple Handheld and Wearable Devices
    Kim, Ju-Whan
    Kim, Han-Jong
    Nam, Tek-Jin
    34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 2307 - 2318
  • [36] A Cluster-Based Differential Evolution With Self-Adaptive Strategy for Multimodal Optimization
    Gao, Weifeng
    Yen, Gary G.
    Liu, Sanyang
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (08) : 1314 - 1327
  • [37] Self-adaptive method for multiple suppression
    Zhang, Jinqiang
    Mu, Yongguang
    Shiyou Diqiu Wuli Kantan/Oil Geophysical Prospecting, 2002, 37 (03):
  • [38] A Gesture-based Multimodal Interface for Human-Robot Interaction
    Uimonen, Mikael
    Kemppi, Paul
    Hakanen, Taru
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 165 - 170
  • [39] Multiple-Classifiers Based Hand Gesture Recognition
    Li, Simin
    Ni, Zihan
    Sang, Nong
    PATTERN RECOGNITION (CCPR 2016), PT I, 2016, 662 : 155 - 163
  • [40] Research on multimodal human-robot interaction based on speech and gesture
    Deng Yongda
    Li Fang
    Xin Huang
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 72 : 443 - 454