Model-Free Learning of Optimal Ergodic Policies in Wireless Systems

被引:7
|
作者
Kalogerias, Dionysios S. [1 ]
Eisen, Mark [3 ]
Pappas, George J. [2 ]
Ribeiro, Alejandro [2 ]
机构
[1] Michigan State Univ, Dept Elect & Comp Engn, E Lansing, MI 48824 USA
[2] Univ Penn, Dept Elect Syst Engn, Philadelphia, PA 19104 USA
[3] Intel Corp, Hillsboro, OR 97124 USA
关键词
Wireless communication; Resource management; Smoothing methods; Stochastic processes; Fading channels; Approximation algorithms; Signal processing algorithms; Wireless systems; stochastic resource allocation; zeroth-order optimization; constrained nonconvex optimization; deep learning; Lagrangian duality; strong duality; RESOURCE-ALLOCATION; POWER ALLOCATION; NETWORKS; OPTIMIZATION; ACCESS;
D O I
10.1109/TSP.2020.3030073
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Learning optimal resource allocation policies in wireless systems can be effectively achieved by formulating finite dimensional constrained programs which depend on system configuration, as well as the adopted learning parameterization. The interest here is in cases where system models are unavailable, prompting methods that probe the wireless system with candidate policies, and then use observed performance to determine better policies. This generic procedure is difficult because of the need to cull accurate gradient estimates out of these limited system queries. This article constructs and exploits smoothed surrogates of constrained ergodic resource allocation problems, the gradients of the former being representable exactly as averages of finite differences that can be obtained through limited system probing. Leveraging this unique property, we develop a new model-free primal-dual algorithm for learning optimal ergodic resource allocations, while we rigorously analyze the relationships between original policy search problems and their surrogates, in both primal and dual domains. First, we show that both primal and dual domain surrogates are uniformly consistent approximations of their corresponding original finite dimensional counterparts. Upon further assuming the use of near-universal policy parameterizations, we also develop explicit bounds on the gap between optimal values of initial, infinite dimensional resource allocation problems, and dual values of their parameterized smoothed surrogates. In fact, we show that this duality gap decreases at a linear rate relative to smoothing and universality parameters. Thus, it can be made arbitrarily small at will, also justifying our proposed primal-dual algorithmic recipe. Numerical simulations confirm the effectiveness of our approach.
引用
收藏
页码:6272 / 6286
页数:15
相关论文
共 50 条
  • [31] Unsupervised Model-Free Representation Learning
    Ryabko, Daniil
    ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 354 - 366
  • [32] Learning model-free motor control
    Agostini, A
    Celaya, E
    ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 947 - 948
  • [33] A model-free robust policy iteration algorithm for optimal control of nonlinear systems
    Bhasin, S.
    Johnson, M.
    Dixon, W. E.
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 3060 - 3065
  • [34] MODEL-FREE LEARNING FROM DEMONSTRATION
    Billing, Erik A.
    Hellstrom, Thomas
    Janlert, Lars-Erik
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2: AGENTS, 2010, : 62 - 71
  • [35] Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment
    Liu, Xiangyu
    Yu, Chao
    Huang, Qikai
    Wang, Luhao
    Wu, Jianfeng
    Guan, Xiangdong
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2021, 2021, 13064 : 105 - 117
  • [36] Model-free optimal trajectories in the image space
    Mezouar, Y
    Chaumette, F
    IROS 2001: PROCEEDINGS OF THE 2001 IEEE/RJS INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-4: EXPANDING THE SOCIETAL ROLE OF ROBOTICS IN THE NEXT MILLENNIUM, 2001, : 25 - 30
  • [37] Model-free compensation learning control of asymmetric hysteretic systems with initial state learning
    Zhang, Yangming
    Luo, Biao
    Zhang, Yanqiong
    Sun, Shanxun
    JOURNAL OF SOUND AND VIBRATION, 2024, 584
  • [38] Distributed synchronization based on model-free reinforcement learning in wireless ad hoc networks
    Zhang, Hang
    Yan, Dongqi
    Zhang, Yanxi
    Liu, Jiamu
    Yao, Mingwu
    COMPUTER NETWORKS, 2023, 227
  • [39] Model-Free Observer for MIMO systems
    Al Younes, Younes
    Noura, Hassan
    Rabhi, Abdelhamid
    El Hajjaji, Ahmed
    2015 IEEE CONFERENCE ON CONTROL AND APPLICATIONS (CCA 2015), 2015, : 1272 - 1277
  • [40] Connecting Model-Based and Model-Free Control With Emotion Modulation in Learning Systems
    Huang, Xiao
    Wu, Wei
    Qiao, Hong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (08): : 4624 - 4638