Cultural Self-Adaptive Multimodal Gesture Generation Based on Multiple Culture Gesture Dataset

被引:1
|
作者
Wu, Jingyu [1 ]
Chen, Shi [2 ]
Gan, Shuyu [1 ]
Li, Weijun [1 ]
Yang, Changyuan [1 ]
Sun, Lingyun [2 ]
机构
[1] Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[2] Zhejiang Singapore Innovat & AI Joint Res Lab, Hangzhou, Zhejiang, Peoples R China
关键词
co-speech gesture generation; datasets; multimodal chatbots; evaluation metric; nonverbal behavior; SPEECH; LANGUAGE;
D O I
10.1145/3581783.3611705
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-speech gesture generation is essential for multimodal chatbots and agents. Previous research extensively studies the relationship between text, audio, and gesture. Meanwhile, to enhance cross-culture communication, culture-specific gestures are crucial for chatbots to learn cultural differences and incorporate cultural cues. However, culture-specific gesture generation faces two challenges: lack of large-scale, high-quality gesture datasets that include diverse cultural groups, and lack of generalization across different cultures. Therefore, in this paper, we first introduce a Multiple Culture Gesture Dataset (MCGD), the largest freely available gesture dataset to date. It consists of ten different cultures, over 200 speakers, and 10,000 segmented sequences. We further propose a Cultural Self-adaptive Gesture Generation Network (CSGN) that takes multimodal relationships into consideration while generating gestures using a cascade architecture and learnable dynamic weight. The CSGN adaptively generates gestures with different cultural characteristics without the need to retrain a new network. It extracts cultural features from the multimodal inputs or a cultural style embedding space with a designated culture. We broadly evaluate our method across four large-scale benchmark datasets. Empirical results show that our method achieves multiple cultural gesture generation and improves comprehensiveness of multimodal inputs. Our method improves the state-of-the-art average FGD from 53.7 to 48.0 and culture deception rate (CDR) from 33.63% to 39.87%.
引用
收藏
页码:3538 / 3549
页数:12
相关论文
共 50 条
  • [21] HRC of intelligent assembly system based on multimodal gesture control
    Duan, Jianguo
    Fang, Yuan
    Zhang, Qinglei
    Qin, Jiyun
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 127 (9-10): : 4307 - 4319
  • [22] Multimodal (Audio, Facial and Gesture) based Emotion Recognition challenge
    Wei, Gou
    Li Jian
    Mo, Sun
    2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020), 2020, : 908 - 911
  • [23] HRC of intelligent assembly system based on multimodal gesture control
    Jianguo Duan
    Yuan Fang
    Qinglei Zhang
    Jiyun Qin
    The International Journal of Advanced Manufacturing Technology, 2023, 127 : 4307 - 4319
  • [24] Self-adaptive Biometric Classifier Working on the Reduced Dataset
    Porwik, Piotr
    Doroz, Rafal
    HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, HAIS 2014, 2014, 8480 : 377 - 388
  • [25] Adaptive Gesture Recognition Based on Human Physical Characteristic
    Ikram, K.
    Khairunizam, Wan
    Aziz, Azri A.
    Bakar, S. A.
    Razlan, Z. M.
    Zunaidi, I
    Desa, H.
    2018 IEEE 14TH INTERNATIONAL COLLOQUIUM ON SIGNAL PROCESSING & ITS APPLICATIONS (CSPA 2018), 2018, : 129 - 134
  • [26] Gesture Recognition System Based on Adaptive Resonance Theory
    Park, Paul K. J.
    Lee, Jun Haeng
    Shin, Chang Woo
    Ryu, Hyun-Surk
    Kang, Byung-Chang
    Carpenter, Gail A.
    Grossberg, Stephen
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3819 - 3822
  • [27] Video based hand gesture recognition dataset using thermal camera
    Birkeland, Simen
    Fjeldvik, Lin Julie
    Noori, Nadia
    Yeduri, Sreenivasa Reddy
    Cenkeramaddi, Linga Reddy
    DATA IN BRIEF, 2024, 54
  • [28] Hand Component Decomposition for the Hand Gesture Recognition Based on FingerPaint Dataset
    Na, In Seop
    Kim, Soo Hyung
    Lee, Chil Woo
    Duong, Nguyen Hai
    2019 ELEVENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN 2019), 2019, : 564 - 566
  • [29] A Pointing Gesture Based Egocentric Interaction System: Dataset, Approach and Application
    Huang, Yichao
    Liu, Xiaorui
    Zhang, Xin
    Jin, Lianwen
    PROCEEDINGS OF 29TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, (CVPRW 2016), 2016, : 370 - 377
  • [30] Compound Gesture Generation: A Model Based on Ideational Units
    Xu, Yuyu
    Pelachaud, Catherine
    Marsella, Stacy
    INTELLIGENT VIRTUAL AGENTS, IVA 2014, 2014, 8637 : 477 - 491