A multi-agent curiosity reward model for task-oriented dialogue systems

被引:1
|
作者
Sun, Jingtao [1 ,2 ]
Kou, Jiayin [1 ,2 ]
Hou, Wenyan [1 ,2 ]
Bai, Yujei [1 ,2 ]
机构
[1] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian 710121, Shaanxi, Peoples R China
[2] Xian Univ Posts & Telecommun, Shaanxi Key Lab Network Data Anal & Intelligent Pr, Xian, Peoples R China
关键词
Task-oriented dialogue systems; Reinforcement learning; Curiosity rewards; Exploration and exploitation;
D O I
10.1016/j.patcog.2024.110884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In practical decision-making dialogues, reinforcement learning methods face hurdles due to delays and sparse reward feedback for agents, and in some cases, lack of rewards altogether. These issues can impede efficient learning of dialogue strategies and compromise the performance of the model. To address this challenge, this paper introduces the Multi-Agent Curiosity Reward Model (MACRM) for task-oriented dialog systems. Firstly, in terms of dialog reward mechanisms, a forward dynamics model generates curiosity rewards, which are integrated with extrinsic rewards from the dialog environment feedback to mitigate the problem of sparse rewards resulting from inadequate agent exploration. Secondly, regarding the dialogue strategy training mechanism, an exploration-exploitation approach inspired by organismic exploration is adopted. This approach involves fully exploring the dialogue environment in the early stages and optimally exploiting learned knowledge later, thereby balancing exploration and exploitation and enhancing dialogue strategy learning efficiency. To assess the proposed model's effectiveness, experiments are conducted using the MultiWOZ corpus across three reward environments: (1) extrinsic rewards only, (2) curiosity rewards only, and (3) a combination of both. The experimental results demonstrate that agents employing MACRM exhibit faster learning of dialogue strategies compared to those relying on a single exploratory reward method, effectively addressing reward sparsity and delay issues in practical decision-making scenarios.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Multi-task learning with graph attention networks for multi-domain task-oriented dialogue systems
    Zhao, Meng
    Wang, Lifang
    Jiang, Zejun
    Li, Ronghan
    Lu, Xinyu
    Hu, Zhongtian
    KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [32] Mutually improved response generation and dialogue summarization for multi-domain task-oriented dialogue systems
    Zhao, Meng
    Wang, Lifang
    Ji, Hongru
    Jiang, Zejun
    Li, Ronghan
    Lu, Xinyu
    Hu, Zhongtian
    KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [33] Task-oriented Resource Allocation for Mobile Edge Computing with Multi-Agent Reinforcement Learning
    Zou, Yue
    Shen, Fei
    Yan, Feng
    Tang, Liang
    2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
  • [34] Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue
    Zhu, Chenguang
    Zeng, Michael
    Huang, Xuedong
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1261 - 1266
  • [35] High-Quality Diversification for Task-Oriented Dialogue Systems
    Tang, Zhiwen
    Kulkarni, Hrishikesh
    Yang, Grace Hui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1861 - 1872
  • [36] Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
    Sun, Weiwei
    Zhang, Shuo
    Balog, Krisztian
    Ren, Zhaochun
    Ren, Pengjie
    Chen, Zhumin
    de Rijke, Maarten
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2499 - 2506
  • [37] MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
    Lin, Zhaojiang
    Madotto, Andrea
    Winata, Genta Indra
    Fung, Pascale
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3391 - 3405
  • [38] Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems
    Sun, Weiwei
    Guo, Shuyu
    Zhang, Shuo
    Ren, Pengjie
    Chen, Zhumin
    de Rijke, Maarten
    Ren, Zhaochun
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
  • [39] Training Neural Response Selection for Task-Oriented Dialogue Systems
    Henderson, Matthew
    Vulic, Ivan
    Gerz, Daniela
    Casanueva, Inigo
    Budzianowski, Pawel
    Coope, Sam
    Spithourakis, Georgios
    Wen, Tsung-Hsien
    Mrksic, Nikola
    Su, Pei-Hao
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5392 - 5404
  • [40] Task-Oriented Dialogue as Dataflow Synthesis
    Andreas, Jacob
    Bufe, John
    Burkett, David
    Chen, Charles
    Clausman, Josh
    Crawford, Jean
    Crim, Kate
    DeLoach, Jordan
    Dorner, Leah
    Eisner, Jason
    Fang, Hao
    Guo, Alan
    Hall, David
    Hayes, Kristin
    Hill, Kellie
    Ho, Diana
    Iwaszuk, Wendy
    Jha, Smriti
    Klein, Dan
    Krishnamurthy, Jayant
    Lanman, Theo
    Liang, Percy
    Lin, Christopher H.
    Lintsbakh, Ilya
    McGovern, Andy
    Nisnevich, Aleksandr
    Pauls, Adam
    Petters, Dmitrij
    Read, Brent
    Roth, Dan
    Roy, Subhro
    Rusak, Jesse
    Short, Beth
    Slomin, Div
    Snyder, Ben
    Striplin, Stephon
    Su, Yu
    Tellman, Zachary
    Thomson, Sam
    Vorobev, Andrei
    Witoszko, Izabela
    Wolfe, Jason
    Wray, Abby
    Zhang, Yuchen
    Zotov, Alexander
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 (08) : 556 - 571