Speaker-Adaptive Multimodal Prediction Model for Listener Responses

被引:5
|
作者
de Kok, Iwan [1 ]
Heylen, Dirk [1 ]
Morency, Louis-Philippe [2 ]
机构
[1] Univ Twente, Human Media Interact, Enschede, Netherlands
[2] USC Inst Creat Technol, Los Angeles, CA USA
关键词
Algorithms; Human Factors; Theory; Listener Responses; Machine Learning; Social Behavior; Multimodal; FEATURES;
D O I
10.1145/2522848.2522866
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this "communicative style". Central to our approach is the idea of "speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adaptation, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 50 条
  • [21] Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis
    Yamagishi, Junichi
    Watts, Oliver
    King, Simon
    Usabaev, Bela
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 418 - +
  • [22] A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System
    Ninh, Duy Khanh
    PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 342 - 346
  • [23] Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis
    Yamagishi, Junichi
    Nose, Takashi
    Zen, Heiga
    Ling, Zhen-Hua
    Toda, Tomoki
    Tokuda, Keiichi
    King, Simon
    Renals, Steve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1208 - 1230
  • [24] MSVQ-based speaker-adaptive Chinese syllable recognition based on discriminative training
    Zhou, L
    Imai, S
    INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 1997, 11 (07) : 569 - 583
  • [25] Prompt Tuning of Deep Neural Networks for Speaker-Adaptive Visual Speech Recognition
    Kim, Minsu
    Kim, Hyung-Il
    Ro, Yong Man
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (02) : 1042 - 1055
  • [26] Predicting Speaker Changes and Listener Responses With And Without Eye-contact
    Neiberg, Daniel
    Gustafson, Joakim
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1576 - 1579
  • [27] Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression
    Hueber, Thomas
    Girin, Laurent
    Alameda-Pineda, Xavier
    Bailly, Gerard
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2246 - 2259
  • [28] Dysarthric Speech Recognition Using Dysarthria-Severity-Dependent and Speaker-Adaptive Models
    Kim, Myung Jong
    Yoo, Joohong
    Kim, Hoirin
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3589 - 3593
  • [29] A Multimodal LSTM for Predicting Listener Empathic Responses Over Time
    Tan, Zhi-Xuan
    Goel, Arushi
    Thanh-Son Nguyen
    Ong, Desmond C.
    2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 705 - 708
  • [30] A Study of Prediction of Listener's Comprehension Based on Multimodal Information
    Kinoshita, Shunichi
    Onishi, Toshiki
    Azuma, Naoki
    Ishii, Ryo
    Fukayama, Atsushi
    Nakamura, Takao
    Miyata, Akihiro
    PROCEEDINGS OF THE 23RD ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS, IVA 2023, 2023,