A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning

被引：0

作者：

Niu, Xuecheng ^{[1
]}

Ito, Akinori ^{[1
]}

Nose, Takashi ^{[1
]}

机构：

[1] Tohoku Univ, Grad Sch Engn, Sendai 9808577, Japan

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

日本科学技术振兴机构; 日本学术振兴会;

关键词：

Training; Reinforcement learning; Planning; Market research; Optimization; Motion pictures; Indium tin oxide; Multi-agent systems; Dialog management; reinforcement learning; deep Dyna-Q; curiosity; multi-agent optimization;

D O I：

10.1109/ACCESS.2024.3462719

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Task-oriented dialog policy learning is often formulated as a Reinforcement Learning problem whose rewards from the environment are extremely sparse, which means that the agent will often not find the reward by acting randomly. Thus, exploration techniques are of primary importance when solving RL problems, and more sophisticated exploration methods must be devised. In this study, we propose a replaceable curiosity-driven candidate agent exploration approach to encourage the agent to balance action sampling and explore new environments without overly violating dialog strategies. In this framework, we follow the employment of the curiosity model but design weight for the curiosity reward to balance exploration and exploitation. We designed a multi-candidate agent mechanism to filter an agent with relatively balanced action sampling for formal dialog training to motivate agents to escape pseudo-optimal actions in the early training stage. In addition, we propose a replacement mechanism for the first time to prevent the elected agents from performing poorly in the later stages of training and to fully utilize all the candidate agents. The experimental results show that the adjustable curiosity reward promotes dialog policy convergence. The agent replacement mechanism effectively blocks the training of poorly trained agents, significantly increasing the task's average success rate and reducing the number of dialog turns. In this research, an exploration approach for task-oriented dialog system is designed to encourage agents to explore environment through balanced action sampling, without significantly deviating from learned dialog strategies. Compared to baselines, the replaceable curiosity-driven candidate agent exploration approach yields a higher average success rate of 0.714 and a lower number of average turns of 20.6.

引用

页码：142640 / 142650

页数：11

共 50 条

[1] A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning
Liang, Songfeng
Xu, Kai
Dong, Zhurong
IEEE ACCESS, 2025, 13 : 11754 - 11764
[2] A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning
Liang, Songfeng
Xu, Kai
Dong, Zhurong
IEEE ACCESS, 2025, 13 : 11754 - 11764
[3] Curiosity-driven Exploration in Reinforcement Learning
Gregor, Michael d
Spalek, Juraj
2014 ELEKTRO, 2014, : 435 - 440
[4] Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration
Zheng, Lulu
Chen, Jiarui
Wang, Jianhao
He, Jiamin
Hu, Yujing
Chen, Yingfeng
Fan, Changjie
Gao, Yang
Zhang, Chongjie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Curiosity-driven Exploration for Cooperative Multi-Agent Reinforcement Learning
Xu, Fanchao
Kaneko, Tomoyuki
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[6] Humans monitor learning progress in curiosity-driven exploration
Ten, Alexandr
Kaushik, Pramod
Oudeyer, Pierre-Yves
Gottlieb, Jacqueline
NATURE COMMUNICATIONS, 2021, 12 (01)
[7] Scheduled Dialog Policy Learning: An Automatic Curriculum Learning Framework for Task-oriented Dialog System
Liu, Sihong
Zhang, Jinchao
He, Keqing
Xu, Weiran
Zhou, Jie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1091 - 1102
[8] Random curiosity-driven exploration in deep reinforcement learning
Li, Jing
Shi, Xinxin
Li, Jiehao
Zhang, Xin
Wang, Junzheng
NEUROCOMPUTING, 2020, 418 : 139 - 147
[9] Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
Zhang, Jiaping
Zhao, Tiancheng
Yu, Zhou
19TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2018), 2018, : 140 - 150
[10] Humans monitor learning progress in curiosity-driven exploration
Alexandr Ten
Pramod Kaushik
Pierre-Yves Oudeyer
Jacqueline Gottlieb
Nature Communications, 12

← 1 2 3 4 5 →