Learning Optimal Behavior in Environments with Non-stationary Observations

被引:0
|
作者
Boone, Ilio [1 ]
Rens, Gavin [1 ]
机构
[1] Katholieke Univ Leuven, DTAI Grp, Leuven, Belgium
关键词
Markov Decision Process; Non-Markovian Reward Models; Mealy Reward Model (MRM); Learning MRMs; Non-stationary;
D O I
10.5220/0010898200003116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In sequential decision-theoretic systems, the dynamics might be Markovian (behavior in the next step is independent of the past, given the present), or non-Markovian (behavior in the next step depends on the past). One approach to represent non-Markovian behaviour has been to employ deterministic finite automata (DFA) with inputs and outputs (e.g. Mealy machines). Moreover, some researchers have proposed frameworks for learning DFA-based models. There are at least two reasons for a system to be non-Markovian: (i) rewards are gained from temporally-dependent tasks, (ii) observations are non-stationary. Rens et al. (2021) tackle learning the applicable DFA for the first case with their ARM algorithm. ARM cannot deal with the second case. Toro Icarte et al. (2019) tackle the problem for the second case with their LRM algorithm. In this paper, we extend ARM to deal with the second case too. The advantage of ARM for learning and acting in non-Markovian systems is that it is based on well-understood formal methods with many available tools.
引用
收藏
页码:729 / 736
页数:8
相关论文
共 50 条
  • [1] Social Learning in non-stationary environments
    Boursier, Etienne
    Perchet, Vianney
    Scarsini, Marco
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [2] Meta-learning optimal parameter values in non-stationary environments
    Sikora, Riyaz T.
    KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 800 - 806
  • [3] Learning User Preferences in Non-Stationary Environments
    Huleihel, Wasim
    Pal, Soumyabrata
    Shayevitz, Ofer
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [4] Towards Reinforcement Learning for Non-stationary Environments
    Dal Toe, Sebastian Gregory
    Tiddeman, Bernard
    Mac Parthalain, Neil
    ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, UKCI 2023, 2024, 1453 : 41 - 52
  • [5] Reinforcement learning algorithm for non-stationary environments
    Sindhu Padakandla
    Prabuchandran K. J.
    Shalabh Bhatnagar
    Applied Intelligence, 2020, 50 : 3590 - 3606
  • [6] Reinforcement learning algorithm for non-stationary environments
    Padakandla, Sindhu
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    APPLIED INTELLIGENCE, 2020, 50 (11) : 3590 - 3606
  • [7] Learning to negotiate optimally in non-stationary environments
    Narayanan, Vidya
    Jennings, Nicholas R.
    COOPERATIVE INFORMATION AGENTS X, PROCEEDINGS, 2006, 4149 : 288 - 300
  • [8] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
    Chen, Liyu
    Luo, Haipeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [9] A robust incremental learning method for non-stationary environments
    Martinez-Rego, David
    Perez-Sanchez, Beatriz
    Fontenla-Romero, Oscar
    Alonso-Betanzos, Amparo
    NEUROCOMPUTING, 2011, 74 (11) : 1800 - 1808
  • [10] A heterogeneous online learning ensemble for non-stationary environments
    Idrees, Mobin M.
    Minku, Leandro L.
    Stahl, Frederic
    Badii, Atta
    KNOWLEDGE-BASED SYSTEMS, 2020, 188