Learning Optimal Behavior in Environments with Non-stationary Observations

被引：0

作者：

Boone, Ilio ^{[1
]}

Rens, Gavin ^{[1
]}

机构：

[1] Katholieke Univ Leuven, DTAI Grp, Leuven, Belgium

来源：

ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3 | 2022年

关键词：

Markov Decision Process; Non-Markovian Reward Models; Mealy Reward Model (MRM); Learning MRMs; Non-stationary;

D O I：

10.5220/0010898200003116

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In sequential decision-theoretic systems, the dynamics might be Markovian (behavior in the next step is independent of the past, given the present), or non-Markovian (behavior in the next step depends on the past). One approach to represent non-Markovian behaviour has been to employ deterministic finite automata (DFA) with inputs and outputs (e.g. Mealy machines). Moreover, some researchers have proposed frameworks for learning DFA-based models. There are at least two reasons for a system to be non-Markovian: (i) rewards are gained from temporally-dependent tasks, (ii) observations are non-stationary. Rens et al. (2021) tackle learning the applicable DFA for the first case with their ARM algorithm. ARM cannot deal with the second case. Toro Icarte et al. (2019) tackle the problem for the second case with their LRM algorithm. In this paper, we extend ARM to deal with the second case too. The advantage of ARM for learning and acting in non-Markovian systems is that it is based on well-understood formal methods with many available tools.

引用

页码：729 / 736

页数：8

共 50 条

[1] Social Learning in non-stationary environments
Boursier, Etienne
Perchet, Vianney
Scarsini, Marco
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
[2] Meta-learning optimal parameter values in non-stationary environments
Sikora, Riyaz T.
KNOWLEDGE-BASED SYSTEMS, 2008, 21 (08) : 800 - 806
[3] Learning User Preferences in Non-Stationary Environments
Huleihel, Wasim
Pal, Soumyabrata
Shayevitz, Ofer
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[4] Towards Reinforcement Learning for Non-stationary Environments
Dal Toe, Sebastian Gregory
Tiddeman, Bernard
Mac Parthalain, Neil
ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, UKCI 2023, 2024, 1453 : 41 - 52
[5] Reinforcement learning algorithm for non-stationary environments
Sindhu Padakandla
Prabuchandran K. J.
Shalabh Bhatnagar
Applied Intelligence, 2020, 50 : 3590 - 3606
[6] Reinforcement learning algorithm for non-stationary environments
Padakandla, Sindhu
Prabuchandran, K. J.
Bhatnagar, Shalabh
APPLIED INTELLIGENCE, 2020, 50 (11) : 3590 - 3606
[7] Learning to negotiate optimally in non-stationary environments
Narayanan, Vidya
Jennings, Nicholas R.
COOPERATIVE INFORMATION AGENTS X, PROCEEDINGS, 2006, 4149 : 288 - 300
[8] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
Chen, Liyu
Luo, Haipeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[9] A robust incremental learning method for non-stationary environments
Martinez-Rego, David
Perez-Sanchez, Beatriz
Fontenla-Romero, Oscar
Alonso-Betanzos, Amparo
NEUROCOMPUTING, 2011, 74 (11) : 1800 - 1808
[10] A heterogeneous online learning ensemble for non-stationary environments
Idrees, Mobin M.
Minku, Leandro L.
Stahl, Frederic
Badii, Atta
KNOWLEDGE-BASED SYSTEMS, 2020, 188

← 1 2 3 4 5 →