Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

被引:31
|
作者
Li, Yingying [1 ]
Tang, Yujie [1 ]
Zhang, Runyu [1 ]
Li, Na [1 ]
机构
[1] Harvard Univ, John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA
基金
美国国家科学基金会;
关键词
Distributed reinforcement learning (RL); linear quadratic regulator (LQR); zero-order optimization; MULTIAGENT SYSTEMS;
D O I
10.1109/TAC.2021.3128592
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article considers a distributed reinforcement learning problem for decentralized linear quadratic (LQ) control with partial state observations and local costs. We propose a zero-order distributed policy optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization, and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance's inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Last, we numerically test ZODPO on multizone HVAC systems.
引用
收藏
页码:6429 / 6444
页数:16
相关论文
共 50 条
  • [1] Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach (Extended Abstract)
    Li, Yingying
    Tang, Yujie
    Zhang, Runyu
    Li, Na
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 814 - 814
  • [2] Derivative-free methods for policy optimization: Guarantees for linear quadratic systems
    Malik, Dhruv
    Pananjady, Ashwin
    Bhatia, Kush
    Khamaru, Koulik
    Bartlett, Peter L.
    Wainwright, Martin J.
    Journal of Machine Learning Research, 2020, 21
  • [3] Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
    Malik, Dhruv
    Pananjady, Ashwin
    Bhatia, Kush
    Khamaru, Koulik
    Bartlett, Peter L.
    Wainwright, Martin J.
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [4] Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
    Malik, Dhruv
    Pananjady, Ashwin
    Bhatia, Kush
    Khamaru, Koulik
    Bartlett, Peter L.
    Wainwright, Martin J.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [5] Derivative-Free Method For Composite Optimization With Applications To Decentralized Distributed Optimization
    Beznosikov, Aleksandr
    Gorbunov, Eduard
    Gasnikov, Alexander
    IFAC PAPERSONLINE, 2020, 53 (02): : 4038 - 4043
  • [6] Derivative-free reinforcement learning: a review
    Qian, Hong
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2021, 15 (06)
  • [7] Derivative-free reinforcement learning: a review
    Hong QIAN
    Yang YU
    Frontiers of Computer Science, 2021, (06) : 44 - 62
  • [8] Derivative-free reinforcement learning: a review
    Hong Qian
    Yang Yu
    Frontiers of Computer Science, 2021, 15
  • [9] Reinforcement Learning with Derivative-Free Exploration
    Chen, Xiong-Hui
    Yu, Yang
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1880 - 1882
  • [10] Combining Local and Global Direct Derivative-free Optimization for Reinforcement Learning
    Leonetti, Matteo
    Kormushev, Petar
    Sagratella, Simone
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2012, 12 (03) : 53 - 65