Prioritized Experience Replay based on Multi-armed Bandit

被引：12

作者：

Liu, Ximing ^{[1
]}

Zhu, Tianqing ^{[2
]}

Jiang, Cuiqing ^{[1
]}

Ye, Dayong ^{[2
]}

Zhao, Fuqing ^{[3
]}

机构：

[1] Hefei Univ Technol, Sch Management, Hefei, Anhui, Peoples R China

[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia

[3] Lanzhou Univ Technol, Sch Comp & Commun Technol, Lanzhou 730050, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 189卷

关键词：

Deep reinforcement learning; Q-learning; Deep Q-network; Experience replay; Multi-armed Bandit;

D O I：

10.1016/j.eswa.2021.116023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Experience replay has been widely used in deep reinforcement learning. The learning algorithm allows online reinforcement learning agents to remember and reuse experiences from the past. In order to further improve the sampling efficiency for experience replay, the most useful experiences are expected to be sampled with higher frequency. Existing methods usually designed their sampling strategy according to a few criteria, but they tended to combine different criteria in a linear or fixed manner, where the strategy were static and independent of the agent learner. This ignores the dynamic attribute of the environment and thus can only lead to a suboptimal performance. In this work, we propose a dynamic experience replay strategy according to the interaction between the agent and environment, which is called Prioritized Experience Replay based on Multi-armed Bandit (PERMAB). PERMAB can adaptively combine multiple priority criteria to measure the importance of the experience. In particular, the weight of each assessing criterion can be adaptively adjusted from episode to episode according to their respective contribution to the agent performance, which guarantees useful criterion to be weighted more in its current state. The proposed replay strategy is able to take both sample informativeness and diversity into consideration, which could significantly boosts learning ability and speed of the game agent. Experimental results show that PERMAB accelerates the network learning and achieves a better performance compared to baseline algorithms on seven benchmark environments with various difficulties.

引用

页数：11

共 50 条

[21] Identifying Outlier Arms in Multi-Armed Bandit
Zhuang, Honglei
Wang, Chi
Wang, Yifan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[22] Characterizing Truthful Multi-Armed Bandit Mechanisms
Babaioff, Moshe
Sharma, Yogeshwer
Slivkins, Aleksandrs
10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009, 2009, : 79 - 88
[23] Robust control of the multi-armed bandit problem
Caro, Felipe
Das Gupta, Aparupa
ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
[24] Anytime Algorithms for Multi-Armed Bandit Problems
Kleinberg, Robert
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 928 - 936
[25] Achieving Privacy in the Adversarial Multi-Armed Bandit
Tossou, Aristide C. Y.
Dimitrakakis, Christos
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2653 - 2659
[26] Generic Outlier Detection in Multi-Armed Bandit
Ban, Yikun
He, Jingrui
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 913 - 923
[27] A modern Bayesian look at the multi-armed bandit
Scott, Steven L.
APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2010, 26 (06) : 639 - 658
[28] Percentile optimization in multi-armed bandit problems
Ghatrani, Zahra
Ghate, Archis
ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) : 837 - 862
[29] A Multi-Armed Bandit Strategy for Countermeasure Selection
Cochrane, Madeleine
Hunjet, Robert
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2510 - 2515
[30] DBA: Dynamic Multi-Armed Bandit Algorithm
Nobari, Sadegh
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9869 - 9870

← 1 2 3 4 5 →