A FAMILY OF s-RECTANGULAR ROBUST MDPs: RELATIVE CONSERVATIVENESS, ASYMPTOTIC ANALYSES, AND FINITE-SAMPLE PROPERTIES \ast

被引:0
|
作者
Ramani, Sivaramakrishnan [1 ]
Ghate, Archis [2 ]
机构
[1] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA
[2] Clemson Univ, Dept Ind Engn, Clemson, SC 29634 USA
关键词
distributionally robust optimization; dynamic programming; value convergence; probabilistic performance guarantees; sample complexity; MARKOV DECISION-PROCESSES;
D O I
10.1137/23M1559920
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We introduce a family of s -rectangular robust Markov decision processes ( s -RMDPs) indexed with \rho \in [1 , \infty ]. In each state, the ambiguity set of transition probability mass functions (pmfs) across actions equals a sublevel set of the \ell \rho -norm of a vector of distances from reference pmfs. Setting \rho = \infty recovers ( s, a )-RMDPs. For any s -RMDP from this family, there is an ( s, a )- RMDP whose robust optimal value is at least as good, and vice versa. This occurs because s - and ( s, a )-RMDPs can employ different ambiguity set radii, casting doubt on the belief that ( s, a )- RMDPs are more conservative than s -RMDPs. If the distance is lower semicontinuous and convex, then, for any s -RMDP, there exists an ( s, a )-RMDP with an identical robust optimal value. We also study data -driven s -RMDPs, where the reference pmf is constructed from state transition samples. If the distance satisfies a Pinsker-type inequality, the robust optimal and out -of -sample values both converge with sample -size to the true optimal. We derive rates of convergence and sample complexity when the distance satisfies a concentration inequality. Under this concentration inequality, the robust optimal value provides a probabilistic lower bound on the out -of -sample value. An artifact of the analyses behind these guarantees is the surprising conclusion that ( s, a )-RMDPs might be the least conservative among all s -RMDPs within our family. The asymptotic and finite sample properties also extend for a class of nonrectangular RMDPs.
引用
收藏
页码:1540 / 1568
页数:29
相关论文
empty
未找到相关数据