Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling

被引:1
|
作者
Lin, Baihan [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
关键词
Multi-armed bandits; genetic algorithm; MIXTURE;
D O I
10.1109/CEC55065.2022.9870279
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As two popular schools of machine learning, online learning and evolutionary computations have become two important driving forces behind real-world decision making engines for applications in biomedicine, economics, and engineering fields. Although there are prior work that utilizes bandits to improve evolutionary algorithms' optimization process, it remains a field of blank on how evolutionary approach can help improve the sequential decision making tasks of online learning agents such as the multi-armed bandits. In this work, we propose the Genetic Thompson Sampling, a bandit algorithm that keeps a population of agents and update them with genetic principles such as elite selection, crossover and mutations. Empirical results in multi-armed bandit simulation environments and a practical epidemic control problem suggest that by incorporating the genetic algorithm into the bandit algorithm, our method significantly outperforms the baselines in nonstationary settings. Lastly, we introduce EvoBandit, a web-based interactive visualization to guide the readers through the entire learning process and perform lightweight evaluations on the fly. We hope to engage researchers into this growing field of research with this investigation.(1)
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Thompson Sampling for Budgeted Multi-armed Bandits
    Xia, Yingce
    Li, Haifang
    Qin, Tao
    Yu, Nenghai
    Liu, Tie-Yan
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3960 - 3966
  • [2] Asymptotic Performance of Thompson Sampling for Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (09) : 5956 - 5970
  • [3] Thompson sampling for multi-armed bandits in big data environments
    Kim, Min Kyong
    Hwang, Beom Seuk
    KOREAN JOURNAL OF APPLIED STATISTICS, 2024, 37 (05)
  • [4] Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits
    Kalkanli, Cem
    Ozgur, Ayfer
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 539 - 544
  • [5] Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits
    Park, Hongju
    Faradonbeh, Mohamad Kazem Shirani
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2150 - 2155
  • [6] Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits
    Jin, Tianyuan
    Xu, Pan
    Xiao, Xiaokui
    Anandkumar, Anima
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [7] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [8] Multi-armed Bandits with Compensation
    Wang, Siwei
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Regional Multi-Armed Bandits
    Wang, Zhiyang
    Zhou, Ruida
    Shen, Cong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [10] Federated Multi-Armed Bandits
    Shi, Chengshuai
    Shen, Cong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611