Learning Optimal Behavior in Environments with Non-stationary Observations

被引:0
|
作者
Boone, Ilio [1 ]
Rens, Gavin [1 ]
机构
[1] Katholieke Univ Leuven, DTAI Grp, Leuven, Belgium
关键词
Markov Decision Process; Non-Markovian Reward Models; Mealy Reward Model (MRM); Learning MRMs; Non-stationary;
D O I
10.5220/0010898200003116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In sequential decision-theoretic systems, the dynamics might be Markovian (behavior in the next step is independent of the past, given the present), or non-Markovian (behavior in the next step depends on the past). One approach to represent non-Markovian behaviour has been to employ deterministic finite automata (DFA) with inputs and outputs (e.g. Mealy machines). Moreover, some researchers have proposed frameworks for learning DFA-based models. There are at least two reasons for a system to be non-Markovian: (i) rewards are gained from temporally-dependent tasks, (ii) observations are non-stationary. Rens et al. (2021) tackle learning the applicable DFA for the first case with their ARM algorithm. ARM cannot deal with the second case. Toro Icarte et al. (2019) tackle the problem for the second case with their LRM algorithm. In this paper, we extend ARM to deal with the second case too. The advantage of ARM for learning and acting in non-Markovian systems is that it is based on well-understood formal methods with many available tools.
引用
收藏
页码:729 / 736
页数:8
相关论文
共 50 条
  • [31] A Model Falsification Approach to Learning in Non-Stationary Environments for Experimental Design
    Andrea Murari
    Michele Lungaroni
    Emmanuele Peluso
    Teddy Craciunescu
    Michela Gelfusa
    Scientific Reports, 9
  • [32] Context-Aware Safe Reinforcement Learning for Non-Stationary Environments
    Chen, Baiming
    Liu, Zuxin
    Zhu, Jiacheng
    Xu, Mengdi
    Ding, Wenhao
    Li, Liang
    Zhao, Ding
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10689 - 10695
  • [33] Continual Reinforcement Learning in 3D Non-stationary Environments
    Lomonaco, Vincenzo
    Desai, Karan
    Culurciello, Eugenio
    Maltoni, Davide
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 999 - 1008
  • [34] An Optimal Algorithm for Adversarial Bandit Problem with Multiple Plays in Non-Stationary Environments
    Vural, N. Mert
    Ozturk, Bugra
    Kozat, Suleyman S.
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [35] Learning Intelligent Behavior in a Non-stationary and Partially Observable Environment
    SelÇuk şenkul
    Faruk Polat
    Artificial Intelligence Review, 2002, 18 : 97 - 115
  • [36] Learning intelligent behavior in a non-stationary and partially observable environment
    Senkul, S
    Polat, F
    ARTIFICIAL INTELLIGENCE REVIEW, 2002, 18 (02) : 97 - 115
  • [37] Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments
    Woo, Honguk
    Yoo, Gwangpyo
    Yoo, Minjong
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8657 - 8665
  • [38] Optimal encoding of non-stationary sources
    Reif, JH
    Storer, JA
    INFORMATION SCIENCES, 2001, 135 (1-2) : 87 - 105
  • [39] An Ensemble Method for Incremental Classification in Stationary and Non-stationary Environments
    Nanculef, Ricardo
    Lopez, Erick
    Allende, Hector
    Allende-Cid, Hector
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2011, 7042 : 541 - 548
  • [40] META-GRADIENTS IN NON-STATIONARY ENVIRONMENTS
    Luketina, Jelena
    Flennerhag, Sebastian
    Schroecker, Yannick
    Abel, David
    Zahavy, Tom
    Singh, Satinder
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199