共 50 条
Best-of-Both-Worlds Algorithms for Partial Monitoring
被引:0
|作者:
Tsuchiya, Taira
[1
,2
]
Ito, Shinji
[3
]
Honda, Junya
[1
,2
]
机构:
[1] Kyoto Univ, Kyoto, Japan
[2] RIKEN AIP, Tokyo, Japan
[3] NEC Corp Ltd, Tokyo, Japan
来源:
关键词:
partial monitoring;
best-of-both-worlds;
follow-the-regularized-leader;
stochastic regime with adversarial corruptions;
REGRET;
D O I:
暂无
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
This study considers the partial monitoring problem with k-actions and d-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is O(m(2)k(4) log(T) log(k.T)/.min) in the stochastic regime and O(mk(3/2) root T log(T) log k Pi) in the adversarial regime, where T is the number of rounds, m is the maximum number of distinct observations per action,.min is the minimum suboptimality gap, and k. is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is O(m(2)k(4) log(T) log(k(Pi)T)/Delta(min)) in the stochastic regime and O(mk(3/2)root Tlog(T) log(k(Pi))) in the adversarial regime, where cG is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-theregularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.
引用
收藏
页码:1484 / 1515
页数:32
相关论文