RISK-SENSITIVE DECISION MAKING VIA CONSTRAINED EXPECTED RETURNS

被引:0
|
作者
Hahn, Juergen [1 ]
Zoubir, Abdelhak M. [1 ]
机构
[1] Tech Univ Darmstadt, Signal Proc Grp, Merckstr 25, D-64283 Darmstadt, Germany
关键词
Markov decision process; Risk; Decision making; Constrained optimization; Reinforcement Learning; REINFORCEMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Decision making based on Markov decision processes (MDPs) is an emerging research area as MDPs provide a convenient formalism to learn an optimal behavior in terms of a given reward. In many applications there are critical states that might harm the agent or the environment and should therefore be avoided. In practice, those states are often simply penalized with a negative reward where the penalty is set in a trial-anderror approach. For this reason, we propose a modification of the well-known value iteration algorithm that guarantees that critical states are visited with a pre-set probability only. Since this leads to an infeasible problem, we investigate the effect of nonlinear and linear approximations and discuss the effects. Two examples demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:2569 / 2573
页数:5
相关论文
共 50 条