A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning

被引:0
|
作者
Niu, Xuecheng [1 ]
Ito, Akinori [1 ]
Nose, Takashi [1 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Sendai 9808577, Japan
来源
IEEE ACCESS | 2024年 / 12卷
基金
日本科学技术振兴机构; 日本学术振兴会;
关键词
Training; Reinforcement learning; Planning; Market research; Optimization; Motion pictures; Indium tin oxide; Multi-agent systems; Dialog management; reinforcement learning; deep Dyna-Q; curiosity; multi-agent optimization;
D O I
10.1109/ACCESS.2024.3462719
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Task-oriented dialog policy learning is often formulated as a Reinforcement Learning problem whose rewards from the environment are extremely sparse, which means that the agent will often not find the reward by acting randomly. Thus, exploration techniques are of primary importance when solving RL problems, and more sophisticated exploration methods must be devised. In this study, we propose a replaceable curiosity-driven candidate agent exploration approach to encourage the agent to balance action sampling and explore new environments without overly violating dialog strategies. In this framework, we follow the employment of the curiosity model but design weight for the curiosity reward to balance exploration and exploitation. We designed a multi-candidate agent mechanism to filter an agent with relatively balanced action sampling for formal dialog training to motivate agents to escape pseudo-optimal actions in the early training stage. In addition, we propose a replacement mechanism for the first time to prevent the elected agents from performing poorly in the later stages of training and to fully utilize all the candidate agents. The experimental results show that the adjustable curiosity reward promotes dialog policy convergence. The agent replacement mechanism effectively blocks the training of poorly trained agents, significantly increasing the task's average success rate and reducing the number of dialog turns. In this research, an exploration approach for task-oriented dialog system is designed to encourage agents to explore environment through balanced action sampling, without significantly deviating from learned dialog strategies. Compared to baselines, the replaceable curiosity-driven candidate agent exploration approach yields a higher average success rate of 0.714 and a lower number of average turns of 20.6.
引用
收藏
页码:142640 / 142650
页数:11
相关论文
共 50 条
  • [1] A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning
    Liang, Songfeng
    Xu, Kai
    Dong, Zhurong
    IEEE ACCESS, 2025, 13 : 11754 - 11764
  • [2] A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning
    Liang, Songfeng
    Xu, Kai
    Dong, Zhurong
    IEEE ACCESS, 2025, 13 : 11754 - 11764
  • [3] Curiosity-driven Exploration in Reinforcement Learning
    Gregor, Michael d
    Spalek, Juraj
    2014 ELEKTRO, 2014, : 435 - 440
  • [4] Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration
    Zheng, Lulu
    Chen, Jiarui
    Wang, Jianhao
    He, Jiamin
    Hu, Yujing
    Chen, Yingfeng
    Fan, Changjie
    Gao, Yang
    Zhang, Chongjie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Curiosity-driven Exploration for Cooperative Multi-Agent Reinforcement Learning
    Xu, Fanchao
    Kaneko, Tomoyuki
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [6] Humans monitor learning progress in curiosity-driven exploration
    Ten, Alexandr
    Kaushik, Pramod
    Oudeyer, Pierre-Yves
    Gottlieb, Jacqueline
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [7] Scheduled Dialog Policy Learning: An Automatic Curriculum Learning Framework for Task-oriented Dialog System
    Liu, Sihong
    Zhang, Jinchao
    He, Keqing
    Xu, Weiran
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1091 - 1102
  • [8] Random curiosity-driven exploration in deep reinforcement learning
    Li, Jing
    Shi, Xinxin
    Li, Jiehao
    Zhang, Xin
    Wang, Junzheng
    NEUROCOMPUTING, 2020, 418 : 139 - 147
  • [9] Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
    Zhang, Jiaping
    Zhao, Tiancheng
    Yu, Zhou
    19TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2018), 2018, : 140 - 150
  • [10] Humans monitor learning progress in curiosity-driven exploration
    Alexandr Ten
    Pramod Kaushik
    Pierre-Yves Oudeyer
    Jacqueline Gottlieb
    Nature Communications, 12