共 50 条
- [21] MAXIMAL AVERAGE-REWARD POLICIES FOR SEMI-MARKOV DECISION PROCESSES WITH ARBITRARY STATE AND ACTION SPACE ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (05): : 1717 - &
- [25] Adaptive aggregation for reinforcement learning in average reward Markov decision processes Annals of Operations Research, 2013, 208 : 321 - 336
- [28] Average Reward Reinforcement Learning for Semi-Markov Decision Processes NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 768 - 777