A multi-agent curiosity reward model for task-oriented dialogue systems

被引:1
|
作者
Sun, Jingtao [1 ,2 ]
Kou, Jiayin [1 ,2 ]
Hou, Wenyan [1 ,2 ]
Bai, Yujei [1 ,2 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian 710121, Shaanxi, Peoples R China
[2] Xian Univ Posts & Telecommun, Shaanxi Key Lab Network Data Anal & Intelligent Pr, Xian, Peoples R China
关键词
Task-oriented dialogue systems; Reinforcement learning; Curiosity rewards; Exploration and exploitation;
D O I
10.1016/j.patcog.2024.110884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In practical decision-making dialogues, reinforcement learning methods face hurdles due to delays and sparse reward feedback for agents, and in some cases, lack of rewards altogether. These issues can impede efficient learning of dialogue strategies and compromise the performance of the model. To address this challenge, this paper introduces the Multi-Agent Curiosity Reward Model (MACRM) for task-oriented dialog systems. Firstly, in terms of dialog reward mechanisms, a forward dynamics model generates curiosity rewards, which are integrated with extrinsic rewards from the dialog environment feedback to mitigate the problem of sparse rewards resulting from inadequate agent exploration. Secondly, regarding the dialogue strategy training mechanism, an exploration-exploitation approach inspired by organismic exploration is adopted. This approach involves fully exploring the dialogue environment in the early stages and optimally exploiting learned knowledge later, thereby balancing exploration and exploitation and enhancing dialogue strategy learning efficiency. To assess the proposed model's effectiveness, experiments are conducted using the MultiWOZ corpus across three reward environments: (1) extrinsic rewards only, (2) curiosity rewards only, and (3) a combination of both. The experimental results demonstrate that agents employing MACRM exhibit faster learning of dialogue strategies compared to those relying on a single exploratory reward method, effectively addressing reward sparsity and delay issues in practical decision-making scenarios.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] EasyDial: A tool for task-oriented dialogue systems on the telephone
    Moisa, L
    Pinton, C
    Popovici, C
    NINTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1998, : 176 - 181
  • [22] Budgeted Policy Learning for Task-Oriented Dialogue Systems
    Zhang, Zhirui
    Li, Xiujun
    Gao, Jianfeng
    Chen, Enhong
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3742 - 3751
  • [23] Understanding User Satisfaction with Task-oriented Dialogue Systems
    Siro, Clemencia
    Aliannejadi, Mohammad
    de Rijke, Maarten
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2018 - 2023
  • [24] Building Task-Oriented Dialogue Systems for Online Shopping
    Yan, Zhao
    Duan, Nan
    Chen, Peng
    Zhou, Ming
    Zhou, Jianshe
    Li, Zhoujun
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4618 - 4625
  • [25] A New Multi-level Knowledge Retrieval Model for Task-Oriented Dialogue
    Dong, Xuelian
    Chen, Jiale
    Weng, Heng
    Chen, Zili
    Wang, Fu Lee
    Hao, Tianyong
    NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 46 - 60
  • [26] Memory-Augmented Dialogue Management for Task-Oriented Dialogue Systems
    Zhang, Zheng
    Huang, Minlie
    Zhao, Zhongzhou
    Ji, Feng
    Chen, Haiqing
    Zhu, Xiaoyan
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (03)
  • [27] Model discrepancy policy optimization for task-oriented dialogue
    Zhou, Zhenyou
    Liu, Zhibin
    Dong, Zhaoan
    Liu, Yuhan
    COMPUTER SPEECH AND LANGUAGE, 2024, 87
  • [28] A Hierarchical Memory Model for Task-Oriented Dialogue System
    Zeng, Ya
    Wan, Li
    Luo, Qiuhong
    Chen, Mao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (08) : 1481 - 1489
  • [29] Pretraining the Noisy Channel Model for Task-Oriented Dialogue
    Liu, Qi
    Yu, Lei
    Rimell, Laura
    Blunsom, Phil
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 657 - 674
  • [30] Variational Reward Estimator Bottleneck: Towards Robust Reward Estimator for Multidomain Task-Oriented Dialogue
    Park, Jeiyoon
    Lee, Chanhee
    Park, Chanjun
    Kim, Kuekyeng
    Lim, Heuiseok
    APPLIED SCIENCES-BASEL, 2021, 11 (14):