Prioritized Experience Replay based on Multi-armed Bandit

被引:12
|
作者
Liu, Ximing [1 ]
Zhu, Tianqing [2 ]
Jiang, Cuiqing [1 ]
Ye, Dayong [2 ]
Zhao, Fuqing [3 ]
机构
[1] Hefei Univ Technol, Sch Management, Hefei, Anhui, Peoples R China
[2] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW, Australia
[3] Lanzhou Univ Technol, Sch Comp & Commun Technol, Lanzhou 730050, Peoples R China
关键词
Deep reinforcement learning; Q-learning; Deep Q-network; Experience replay; Multi-armed Bandit;
D O I
10.1016/j.eswa.2021.116023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Experience replay has been widely used in deep reinforcement learning. The learning algorithm allows online reinforcement learning agents to remember and reuse experiences from the past. In order to further improve the sampling efficiency for experience replay, the most useful experiences are expected to be sampled with higher frequency. Existing methods usually designed their sampling strategy according to a few criteria, but they tended to combine different criteria in a linear or fixed manner, where the strategy were static and independent of the agent learner. This ignores the dynamic attribute of the environment and thus can only lead to a suboptimal performance. In this work, we propose a dynamic experience replay strategy according to the interaction between the agent and environment, which is called Prioritized Experience Replay based on Multi-armed Bandit (PERMAB). PERMAB can adaptively combine multiple priority criteria to measure the importance of the experience. In particular, the weight of each assessing criterion can be adaptively adjusted from episode to episode according to their respective contribution to the agent performance, which guarantees useful criterion to be weighted more in its current state. The proposed replay strategy is able to take both sample informativeness and diversity into consideration, which could significantly boosts learning ability and speed of the game agent. Experimental results show that PERMAB accelerates the network learning and achieves a better performance compared to baseline algorithms on seven benchmark environments with various difficulties.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Identifying Outlier Arms in Multi-Armed Bandit
    Zhuang, Honglei
    Wang, Chi
    Wang, Yifan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [22] Characterizing Truthful Multi-Armed Bandit Mechanisms
    Babaioff, Moshe
    Sharma, Yogeshwer
    Slivkins, Aleksandrs
    10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009, 2009, : 79 - 88
  • [23] Robust control of the multi-armed bandit problem
    Caro, Felipe
    Das Gupta, Aparupa
    ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480
  • [24] Anytime Algorithms for Multi-Armed Bandit Problems
    Kleinberg, Robert
    PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 928 - 936
  • [25] Achieving Privacy in the Adversarial Multi-Armed Bandit
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2653 - 2659
  • [26] Generic Outlier Detection in Multi-Armed Bandit
    Ban, Yikun
    He, Jingrui
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 913 - 923
  • [27] A modern Bayesian look at the multi-armed bandit
    Scott, Steven L.
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2010, 26 (06) : 639 - 658
  • [28] Percentile optimization in multi-armed bandit problems
    Ghatrani, Zahra
    Ghate, Archis
    ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) : 837 - 862
  • [29] A Multi-Armed Bandit Strategy for Countermeasure Selection
    Cochrane, Madeleine
    Hunjet, Robert
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2510 - 2515
  • [30] DBA: Dynamic Multi-Armed Bandit Algorithm
    Nobari, Sadegh
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9869 - 9870