Labeling Q-learning in hidden state environments

被引:0
|
作者
Hae-Yeon Lee
Hiroyuki Kamaya
Ken-ichi Abe
机构
[1] Tohoku University,Department of Electric and Communication Engineering, Graduate School of Engineering
[2] Hachinohe National College of Technology,Department of Electrical Engineering
关键词
Reinforcement learning; Labeling Q-learning; Hidden states environment; Agent; Grid-world; Partially observable Markov decision process (POMDP);
D O I
10.1007/BF02481264
中图分类号
学科分类号
摘要
Recently,reinforcement learning (RL) methods have been used for learning problems in environments with embedded hidden states. However, conventional RL methods have been limited to handlingMarkov decision process problems. In order to overcome hidden states, several algorithms were proposed, but these need an extreme amount of memory of past sequences which represent historical state transitions. The aim of this work was to extend our previously proposed algorithm for hidden states in an environment, calledlabeling Q-learning (LQ-learning), which reinforces incompletely observed perception by labeling. In LQ-learning, the agent has a perception structure which consists of pairs of observations and labels. From these pairs, the agent can distinguish more exactly hidden states which look the same but are actually different each other. Labeling is carried out by labeling functions. Numerous labeling functions can be considered, but here we introduce some labeling functions based on the sequence of only the last and the current observations. This extended LQ-learning is applied to grid-world problems which have hidden states. The results of these simulations show the availability of LQ-learning.
引用
收藏
页码:181 / 184
页数:3
相关论文
共 50 条
  • [1] Labeling Q-learning in POMDP environments
    Lee, HY
    Kamaya, HY
    Abe, K
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2002, E85D (09) : 1425 - 1432
  • [2] Labeling Q-learning embedded with knowledge update in partially observable MDP environments
    Lee, H
    Kamaya, H
    Abe, K
    ICCC 2004: SECOND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL CYBERNETICS, PROCEEDINGS, 2004, : 329 - 332
  • [3] A Dynamic Hidden Forwarding Path Planning Method Based on Improved Q-Learning in SDN Environments
    Chen, Yun
    Lv, Kun
    Hu, Changzhen
    SECURITY AND COMMUNICATION NETWORKS, 2018,
  • [4] Concurrent Q-learning: Reinforcement learning for dynamic goals and environments
    Ollington, RB
    Vamplew, PW
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1037 - 1052
  • [5] APPLYING Q-LEARNING TO NON-MARKOVIAN ENVIRONMENTS
    Chizhov, Jurij
    Borisov, Arkady
    ICAART 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, 2009, : 306 - +
  • [6] Switching Q-learning in partially observable Markovian environments
    Kamaya, H
    Lee, H
    Abe, K
    2000 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2000), VOLS 1-3, PROCEEDINGS, 2000, : 1062 - 1067
  • [7] Dynamic Choice of State Abstraction in Q-Learning
    Tamassia, Marco
    Zambetta, Fabio
    Raffe, William L.
    Mueller, Florian 'Floyd'
    Li, Xiaodong
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 46 - 54
  • [8] Q-learning in continuous state and action spaces
    Gaskett, C
    Wettergreen, D
    Zelinsky, A
    ADVANCED TOPICS IN ARTIFICIAL INTELLIGENCE, 1999, 1747 : 417 - 428
  • [9] Q-learning with adaptive state space construction
    Murao, H
    Kitamura, S
    LEARNING ROBOTS, PROCEEDINGS, 1998, 1545 : 13 - 28
  • [10] Q-Learning with adaptive state segmentation (QLASS)
    Murao, H
    Kitamura, S
    1997 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION - CIRA '97, PROCEEDINGS: TOWARDS NEW COMPUTATIONAL PRINCIPLES FOR ROBOTICS AND AUTOMATION, 1997, : 179 - 184