Universal complexity bounds based on value iteration for stochastic mean payoff games and entropy games

被引:0
|
作者
Allamigeon, Xavier [1 ,2 ]
Gaubert, Stephane [1 ,2 ]
Katz, Ricardo D. [3 ]
Skomra, Mateusz [4 ]
机构
[1] CNRS, Ecole Polytech, INRIA, IP Paris, Palaiseau, France
[2] CNRS, Ecole Polytech, CMAP, IP Paris, Palaiseau, France
[3] Consejo Nacl Invest Cient & Tecn, CIFASIS, Bv 27 Febrero 210 bis, RA-2000 Rosario, Argentina
[4] Univ Toulouse, CNRS, LAAS, Toulouse, France
关键词
Mean-payoff games; Entropy games; Value iteration; Perron root; Separation bounds; Parameterized complexity; DYNAMIC-PROGRAMMING RECURSIONS; PERFECT INFORMATION; OPERATOR APPROACH; ALGORITHM; EXPANSIONS; NUMBERS;
D O I
10.1016/j.ic.2024.105236
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We develop value iteration-based algorithms to solve in a unified manner different classes of combinatorial zero-sum games with mean-payoff type rewards. These algorithms rely on an oracle, evaluating the dynamic programming operator up to a given precision. We show that the number of calls to the oracle needed to determine exact optimal (positional) strategies is, up to a factor polynomial in the dimension, of order R/ sep, where the "separation" sep is defined as the minimal difference between distinct values arising from strategies, and R is a metric estimate, involving the norm of approximate sub and supereigenvectors of the dynamic programming operator. We illustrate this method by two applications. The first one is a new proof, leading to improved complexity estimates, of a theorem of Boros, Elbassioni, Gurvich and Makino, showing that turn-based mean-payoff games with a fixed number of random positions can be solved in pseudo-polynomial time. The second one concerns entropy games, a model introduced by Asarin, Cervelle, Degorre, Dima, Horn and Kozyakin. The rank of an entropy game is defined as the maximal rank among all the ambiguity matrices determined by strategies of the two players. We show that entropy games with a fixed rank, in their original formulation, can be solved in polynomial time, and that an extension of entropy games incorporating weights can be solved in pseudo-polynomial time under the same fixed rank condition. (c) 2024 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] Solving multichain stochastic games with mean payoff by policy iteration
    Akian, Marianne
    Cochet-Terrasson, Jean
    Detournay, Sylvie
    Gaubert, Stephane
    2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 1834 - 1841
  • [2] Simple Stochastic Games, Mean Payoff Games, Parity Games
    Zwick, Uri
    COMPUTER SCIENCE - THEORY AND APPLICATIONS, 2008, 5010 : 29 - 29
  • [3] The complexity of mean payoff games on graphs
    Zwick, U
    Paterson, M
    THEORETICAL COMPUTER SCIENCE, 1996, 158 (1-2) : 343 - 359
  • [4] A policy iteration algorithm for zero-sum stochastic games with mean payoff
    Cochet-Terrasson, Jean
    Gaubert, Stephane
    COMPTES RENDUS MATHEMATIQUE, 2006, 343 (05) : 377 - 382
  • [5] The Complexity of Ergodic Mean-payoff Games
    Chatterjee, Krishnendu
    Ibsen-Jensen, Rasmus
    AUTOMATA, LANGUAGES, AND PROGRAMMING (ICALP 2014), PT II, 2014, 8573 : 122 - 133
  • [6] The Complexity of Mean-Payoff Pushdown Games
    Chatterjee, Krishnendu
    Velner, Yaron
    JOURNAL OF THE ACM, 2017, 64 (05)
  • [7] Strategy recovery for stochastic mean payoff games
    Mamino, Marcello
    THEORETICAL COMPUTER SCIENCE, 2017, 675 : 101 - 104
  • [8] Stochastic Window Mean-Payoff Games
    Doyen, Laurent
    Gaba, Pranshu
    Guha, Shibashis
    FOUNDATIONS OF SOFTWARE SCIENCE AND COMPUTATION STRUCTURES, PT I, FOSSACS 2024, 2024, 14574 : 34 - 54
  • [9] Simple Stochastic Games, Parity Games, Mean Payoff Games and Discounted Payoff Games Are All LP-Type Problems
    Nir Halman
    Algorithmica, 2007, 49 : 37 - 50
  • [10] Simple stochastic games, parity games, mean payoff games and discounted payoff games are all LP-type problems
    Halman, Nir
    ALGORITHMICA, 2007, 49 (01) : 37 - 50