Speaker-Adaptive Multimodal Prediction Model for Listener Responses

被引:5
|
作者
de Kok, Iwan [1 ]
Heylen, Dirk [1 ]
Morency, Louis-Philippe [2 ]
机构
[1] Univ Twente, Human Media Interact, Enschede, Netherlands
[2] USC Inst Creat Technol, Los Angeles, CA USA
关键词
Algorithms; Human Factors; Theory; Listener Responses; Machine Learning; Social Behavior; Multimodal; FEATURES;
D O I
10.1145/2522848.2522866
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this "communicative style". Central to our approach is the idea of "speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adaptation, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 50 条
  • [31] A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
    Yu, Licheng
    Tan, Hao
    Bansal, Mohit
    Berg, Tamara L.
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3521 - 3529
  • [32] SCALING AND BIAS CODES FOR MODELING SPEAKER-ADAPTIVE DNN-BASED SPEECH SYNTHESIS SYSTEMS
    Hieu-Thi Luong
    Yamagishi, Junichi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 610 - 617
  • [33] BOOTSTRAPPING NON-PARALLEL VOICE CONVERSION FROM SPEAKER-ADAPTIVE TEXT-TO-SPEECH
    Luong, Hieu-Thi
    Yamagishi, Junichi
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 200 - 207
  • [34] HMM-based distributed text-to-speech synthesis incorporating speaker-adaptive training
    Jeon, Kwang Myung
    Choi, Seung Ho
    International Journal of Multimedia and Ubiquitous Engineering, 2014, 9 (05): : 107 - 119
  • [35] An Adaptive Multimodal Learning Model for Financial Market Price Prediction
    Anbaee Farimani, Saeede
    Jahan, Majid Vafaei
    Milani Fard, Amin
    IEEE ACCESS, 2024, 12 : 121846 - 121863
  • [36] DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS
    Ozturk, Mirac Goksu
    Ulusoy, Okan
    Demiroglu, Cenk
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7030 - 7034
  • [37] A Preliminary Analysis of Incidental Bidirectional Naming and Derived Listener and Speaker Relations for Literacy Responses
    Abdool-Ghany, Faheema
    Fienup, Daniel
    BEHAVIOR AND SOCIAL ISSUES, 2024, 33 (01) : 130 - 164
  • [38] DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features
    Md. Shah Fahad
    Akshay Deepak
    Gayadhar Pradhan
    Jainath Yadav
    Circuits, Systems, and Signal Processing, 2021, 40 : 466 - 489
  • [39] DNN-HMM-Based Speaker-Adaptive Emotion Recognition Using MFCC and Epoch-Based Features
    Fahad, Md. Shah
    Deepak, Akshay
    Pradhan, Gayadhar
    Yadav, Jainath
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (01) : 466 - 489
  • [40] AFFECTIVE RELATIONSHIP BETWEEN SPEAKER AND LISTENER - ALTERNATIVE TO APPROACH-AVOIDANCE MODEL
    CRONEN, VE
    PRICE, WK
    COMMUNICATION MONOGRAPHS, 1976, 43 (01) : 51 - 59