Cultural Self-Adaptive Multimodal Gesture Generation Based on Multiple Culture Gesture Dataset

被引:1
|
作者
Wu, Jingyu [1 ]
Chen, Shi [2 ]
Gan, Shuyu [1 ]
Li, Weijun [1 ]
Yang, Changyuan [1 ]
Sun, Lingyun [2 ]
机构
[1] Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[2] Zhejiang Singapore Innovat & AI Joint Res Lab, Hangzhou, Zhejiang, Peoples R China
关键词
co-speech gesture generation; datasets; multimodal chatbots; evaluation metric; nonverbal behavior; SPEECH; LANGUAGE;
D O I
10.1145/3581783.3611705
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Co-speech gesture generation is essential for multimodal chatbots and agents. Previous research extensively studies the relationship between text, audio, and gesture. Meanwhile, to enhance cross-culture communication, culture-specific gestures are crucial for chatbots to learn cultural differences and incorporate cultural cues. However, culture-specific gesture generation faces two challenges: lack of large-scale, high-quality gesture datasets that include diverse cultural groups, and lack of generalization across different cultures. Therefore, in this paper, we first introduce a Multiple Culture Gesture Dataset (MCGD), the largest freely available gesture dataset to date. It consists of ten different cultures, over 200 speakers, and 10,000 segmented sequences. We further propose a Cultural Self-adaptive Gesture Generation Network (CSGN) that takes multimodal relationships into consideration while generating gestures using a cascade architecture and learnable dynamic weight. The CSGN adaptively generates gestures with different cultural characteristics without the need to retrain a new network. It extracts cultural features from the multimodal inputs or a cultural style embedding space with a designated culture. We broadly evaluate our method across four large-scale benchmark datasets. Empirical results show that our method achieves multiple cultural gesture generation and improves comprehensiveness of multimodal inputs. Our method improves the state-of-the-art average FGD from 53.7 to 48.0 and culture deception rate (CDR) from 33.63% to 39.87%.
引用
收藏
页码:3538 / 3549
页数:12
相关论文
共 50 条
  • [1] Amaging: Acoustic Hand Imaging for Self-adaptive Gesture Recognition
    Wang, Penghao
    Jiang, Ruobing
    Liu, Chao
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 80 - 89
  • [2] Self-adaptive Gesture Classifier Using Fuzzy Classifiers with Entropy Based Rule Pruning
    Malhotra, Riidhei
    Srivastava, Ritesh
    Bhartee, Ajeet Kumar
    Verma, Mridula
    INTELLIGENT INFORMATICS, 2013, 182 : 217 - 223
  • [3] Cross-Scenario Device-Free Gesture Recognition Based on Self-Adaptive Adversarial Learning
    Wang, Jie
    Wang, Changcheng
    Yin, Dongyue
    Gao, Qinghua
    Liu, Xiaokai
    Pan, Miao
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (09): : 7080 - 7090
  • [4] Gesture Based Music Generation
    Prasad, Jay Shankar
    Nandi, G. C.
    Kumar, Amit
    2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 369 - +
  • [5] Multimodal Gesture Recognition via Multiple Hypotheses Rescoring
    Pitsikalis, Vassilis
    Katsamanis, Athanasios
    Theodorakis, Stavros
    Maragos, Petros
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 255 - 284
  • [6] Multimodal gesture recognition via multiple hypotheses rescoring
    Pitsikalis, Vassilis
    Katsamanis, Athanasios
    Theodorakis, Stavros
    Maragos, Petros
    Journal of Machine Learning Research, 2015, 16 : 255 - 284
  • [7] Multimodal Gesture Recognition Based on Choquet Integral
    Hirota, K.
    Vu, H. A.
    Le, P. Q.
    Fatichah, C.
    Liu, Z.
    Tang, Y.
    Tangel, M. L.
    Mu, Z.
    Sun, B.
    Yan, F.
    Masano, D.
    Thet, O.
    Yamaguchi, M.
    Dong, F.
    Yamazaki, Y.
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 772 - 776
  • [8] Automatic Dataset Collection for Speech-Driven Gesture Generation
    Nagi, Takafumi
    Kaneko, Naoshi
    Ito, Seiya
    Sumi, Kazuhiko
    FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794
  • [9] CORPUS-BASED GESTURE ANALYSIS: AN EXTENSION OF THE FORM DATASET FOR THE AUTOMATIC DETECTION OF PHASES IN A GESTURE
    Martell, Craig
    Kroll, Joshua
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2007, 1 (04) : 521 - 536
  • [10] Hand Gesture Recognition Across Various Limb Positions Using a Multimodal Sensing System Based on Self-Adaptive Data-Fusion and Convolutional Neural Networks (CNNs)
    Zhang, Shen
    Zhou, Hao
    Tchantchane, Rayane
    Alici, Gursel
    IEEE SENSORS JOURNAL, 2024, 24 (11) : 18633 - 18645