A multi-agent curiosity reward model for task-oriented dialogue systems

被引：1

作者：

Sun, Jingtao ^{[1
,2
]}

Kou, Jiayin ^{[1
,2
]}

Hou, Wenyan ^{[1
,2
]}

Bai, Yujei ^{[1
,2
]}

机构：

[1] Xian Univ Posts & Telecommun, Sch Comp Sci & Technol, Xian 710121, Shaanxi, Peoples R China

[2] Xian Univ Posts & Telecommun, Shaanxi Key Lab Network Data Anal & Intelligent Pr, Xian, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 157卷

关键词：

Task-oriented dialogue systems; Reinforcement learning; Curiosity rewards; Exploration and exploitation;

D O I：

10.1016/j.patcog.2024.110884

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In practical decision-making dialogues, reinforcement learning methods face hurdles due to delays and sparse reward feedback for agents, and in some cases, lack of rewards altogether. These issues can impede efficient learning of dialogue strategies and compromise the performance of the model. To address this challenge, this paper introduces the Multi-Agent Curiosity Reward Model (MACRM) for task-oriented dialog systems. Firstly, in terms of dialog reward mechanisms, a forward dynamics model generates curiosity rewards, which are integrated with extrinsic rewards from the dialog environment feedback to mitigate the problem of sparse rewards resulting from inadequate agent exploration. Secondly, regarding the dialogue strategy training mechanism, an exploration-exploitation approach inspired by organismic exploration is adopted. This approach involves fully exploring the dialogue environment in the early stages and optimally exploiting learned knowledge later, thereby balancing exploration and exploitation and enhancing dialogue strategy learning efficiency. To assess the proposed model's effectiveness, experiments are conducted using the MultiWOZ corpus across three reward environments: (1) extrinsic rewards only, (2) curiosity rewards only, and (3) a combination of both. The experimental results demonstrate that agents employing MACRM exhibit faster learning of dialogue strategies compared to those relying on a single exploratory reward method, effectively addressing reward sparsity and delay issues in practical decision-making scenarios.

引用

页数：11

共 50 条

[31] Multi-task learning with graph attention networks for multi-domain task-oriented dialogue systems
Zhao, Meng
Wang, Lifang
Jiang, Zejun
Li, Ronghan
Lu, Xinyu
Hu, Zhongtian
KNOWLEDGE-BASED SYSTEMS, 2023, 259
[32] Mutually improved response generation and dialogue summarization for multi-domain task-oriented dialogue systems
Zhao, Meng
Wang, Lifang
Ji, Hongru
Jiang, Zejun
Li, Ronghan
Lu, Xinyu
Hu, Zhongtian
KNOWLEDGE-BASED SYSTEMS, 2023, 279
[33] Task-oriented Resource Allocation for Mobile Edge Computing with Multi-Agent Reinforcement Learning
Zou, Yue
Shen, Fei
Yan, Feng
Tang, Liang
2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
[34] Multi-task Learning for Natural Language Generation in Task-Oriented Dialogue
Zhu, Chenguang
Zeng, Michael
Huang, Xuedong
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1261 - 1266
[35] High-Quality Diversification for Task-Oriented Dialogue Systems
Tang, Zhiwen
Kulkarni, Hrishikesh
Yang, Grace Hui
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1861 - 1872
[36] Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
Sun, Weiwei
Zhang, Shuo
Balog, Krisztian
Ren, Zhaochun
Ren, Pengjie
Chen, Zhumin
de Rijke, Maarten
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2499 - 2506
[37] MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Lin, Zhaojiang
Madotto, Andrea
Winata, Genta Indra
Fung, Pascale
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3391 - 3405
[38] Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems
Sun, Weiwei
Guo, Shuyu
Zhang, Shuo
Ren, Pengjie
Chen, Zhumin
de Rijke, Maarten
Ren, Zhaochun
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
[39] Training Neural Response Selection for Task-Oriented Dialogue Systems
Henderson, Matthew
Vulic, Ivan
Gerz, Daniela
Casanueva, Inigo
Budzianowski, Pawel
Coope, Sam
Spithourakis, Georgios
Wen, Tsung-Hsien
Mrksic, Nikola
Su, Pei-Hao
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5392 - 5404
[40] Task-Oriented Dialogue as Dataflow Synthesis
Andreas, Jacob
Bufe, John
Burkett, David
Chen, Charles
Clausman, Josh
Crawford, Jean
Crim, Kate
DeLoach, Jordan
Dorner, Leah
Eisner, Jason
Fang, Hao
Guo, Alan
Hall, David
Hayes, Kristin
Hill, Kellie
Ho, Diana
Iwaszuk, Wendy
Jha, Smriti
Klein, Dan
Krishnamurthy, Jayant
Lanman, Theo
Liang, Percy
Lin, Christopher H.
Lintsbakh, Ilya
McGovern, Andy
Nisnevich, Aleksandr
Pauls, Adam
Petters, Dmitrij
Read, Brent
Roth, Dan
Roy, Subhro
Rusak, Jesse
Short, Beth
Slomin, Div
Snyder, Ben
Striplin, Stephon
Su, Yu
Tellman, Zachary
Thomson, Sam
Vorobev, Andrei
Witoszko, Izabela
Wolfe, Jason
Wray, Abby
Zhang, Yuchen
Zotov, Alexander
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 (08) : 556 - 571

← 1 2 3 4 5 →