A NON-HIERARCHICAL ATTENTION NETWORK WITH MODALITY DROPOUT FOR TEXTUAL RESPONSE GENERATION IN MULTIMODAL DIALOGUE SYSTEMS

被引:2
|
作者
Sun, Rongyi [1 ]
Chen, Borun [1 ]
Zhou, Qingyu [2 ]
Li, Yinghui [1 ]
Cao, Yunbo [2 ]
Zheng, Hai-Tao [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Tencent Cloud Xiaowei, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Dialogue Systems; HRED; Non-Hierarchical; Attention; Modality Dropout;
D O I
10.1109/ICASSP43922.2022.9746613
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing text- and image-based multimodal dialogue systems use the traditional Hierarchical Recurrent Encoder-Decoder (HRED) framework, which has an utterance-level encoder to model utterance representation and a context-level encoder to model context representation. Although pioneer efforts have shown promising performances, they still suffer from the following challenges: (1) the interaction between textual features and visual features is not fine-grained enough. (2) the context representation can not provide a complete representation for the context. To address the issues mentioned above, we propose a non-hierarchical attention network with modality dropout, which abandons the HRED framework and utilizes attention modules to encode each utterance and model the context representation. To evaluate our proposed model, we conduct comprehensive experiments on a public multimodal dialogue dataset. Automatic and human evaluation demonstrate that our proposed model outperforms the existing methods and achieves state-of-the-art performance.
引用
收藏
页码:6582 / 6586
页数:5
相关论文
共 7 条
  • [1] Hierarchical multimodal attention for end -to -end audio-visual scene -aware dialogue response generation
    Le, Hung
    Sahoo, Doyen
    Chen, Nancy F.
    Hoi, Steven C. H.
    COMPUTER SPEECH AND LANGUAGE, 2020, 63
  • [2] Hierarchical Recurrent Attention Network for Response Generation
    Xing, Chen
    Wu, Yu
    Wu, Wei
    Huang, Yalou
    Zhou, Ming
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5610 - 5617
  • [3] Multi-Aspect Controlled Response Generation in a Multimodal Dialogue System using Hierarchical Transformer Network
    Firdaus, Mauajama
    Thakur, Nidhi
    Ekbal, Asif
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Neural Network With Hierarchical Attention Mechanism for Contextual Topic Dialogue Generation
    Sun, Xiao
    Ding, Bingbing
    IEEE ACCESS, 2022, 10 : 4628 - 4639
  • [5] Hierarchical Knowledge Aggregation for Personalized Response Generation in Dialogue Systems
    Dong, Yuezhou
    Qin, Ke
    Liang, Shuang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 29 - 42
  • [6] HSAN: A HIERARCHICAL SELF-ATTENTION NETWORK FOR MULTI-TURN DIALOGUE GENERATION
    Kong, Yawei
    Zhang, Lu
    Ma, Can
    Cao, Cong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7433 - 7437
  • [7] A Hierarchical Structured Multi-Head Attention Network for Multi-Turn Response Generation
    Lin, Fei
    Zhang, Cong
    Liu, Shengqiang
    Ma, Hong
    IEEE ACCESS, 2020, 8 : 46802 - 46810