Solving semi-Markov decision problems using average reward reinforcement learning

被引:0
|
作者
Dept. Indust. and Mgmt. Syst. Eng., University of South Florida, Tampa, FL 33620, United States [1 ]
不详 [2 ]
不详 [3 ]
机构
来源
Manage Sci | / 4卷 / 560-574期
关键词
D O I
暂无
中图分类号
学科分类号
摘要
37
引用
收藏
相关论文
共 50 条
  • [1] Solving semi-Markov decision problems using average reward reinforcement learning
    Das, TK
    Gosavi, A
    Mahadevan, S
    Marchalleck, N
    MANAGEMENT SCIENCE, 1999, 45 (04) : 560 - 574
  • [2] Average Reward Reinforcement Learning for Semi-Markov Decision Processes
    Yang, Jiayuan
    Li, Yanjie
    Chen, Haoyao
    Li, Jiangang
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 768 - 777
  • [3] RVI Reinforcement Learning for Semi-Markov Decision Processes with Average Reward
    Li, Yanjie
    Cao, Fang
    2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 1674 - 1679
  • [4] A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward
    Gosavi, A
    Das, TK
    Sarkar, S
    IIE TRANSACTIONS, 2004, 36 (06) : 557 - 567
  • [5] On average reward semi-markov decision processes with a general multichain structure
    Jianyong, L
    Xiaobo, Z
    MATHEMATICS OF OPERATIONS RESEARCH, 2004, 29 (02) : 339 - 352
  • [6] A Unified Approach for Semi-Markov Decision Processes with Discounted and Average Reward Criteria
    Li, Yanjie
    Wang, Huijing
    Chen, Haoyao
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 1741 - 1744
  • [7] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
    Tan, Chuanfang
    Li, Yanjie
    Cheng, Yuhu
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
  • [9] AVERAGE COST SEMI-MARKOV DECISION PROCESSES
    ROSS, SM
    JOURNAL OF APPLIED PROBABILITY, 1970, 7 (03) : 649 - &
  • [10] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
    Ronald Ortner
    Annals of Operations Research, 2013, 208 : 321 - 336