Learning to Model Opponent Learning (Student Abstract)

被引：0

作者：

Davies, Ian ^{[1
]}

Tian, Zheng ^{[1
]}

Wang, Jun ^{[1
]}

机构：

[1] UCL, Gower St, London WC1E 6BT, England

来源：

THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. The adaptation and learning of other agents induces non-stationarity in the environment dynamics. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. Policy search algorithms also struggle in multi-agent settings as the partial observability resulting from an opponent's actions not being known introduces high variance to policy training. Modelling an agent's opponent(s) is often pursued as a means of resolving the issues arising from the coexistence of learning opponents. An opponent model provides an agent with some ability to reason about other agents to aid its own decision making. Most prior works learn an opponent model by assuming the opponent is employing a stationary policy or switching between a set of stationary policies. Such an approach can reduce the variance of training signals for policy search algorithms. However, in the multi-agent setting, agents have an incentive to continually adapt and learn. This means that the assumptions concerning opponent stationarity are unrealistic. In this work, we develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL). We show our structured opponent model is more accurate and stable than naive behaviour cloning baselines. We further show that opponent modelling can improve the performance of algorithmic agents in multi-agent settings.

引用

页码：13771 / 13772

页数：2

共 50 条

[31] An innovative clinic model for student learning
Chakyayil, Shaleen
Rogers, Margot
Demers, Lindsay B.
CLINICAL TEACHER, 2022, 19 (06):
[32] MBGRLp : Multiscale Bootstrap Graph Representation Learning on Pointcloud (Student Abstract)
Gorade, Vandan
Singh, Azad
Mishra, Deepak
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12957 - 12958
[33] A Multi-Task Learning Approach to Sarcasm Detection (Student Abstract)
Savini, Edoardo
Caragea, Cornelia
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13907 - 13908
[34] Augmentation of Chinese Character Representations with Compositional Graph Learning (Student Abstract)
Wang, Jason
Fu, Kaiqun
Chen, Zhiqian
Lu, Chang-Tien
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 13075 - 13076
[35] Simulation Coupled Learning for a Robotic Opponent
Reid, James
PROCEEDINGS OF THE 48TH ANNUAL SOUTHEAST REGIONAL CONFERENCE (ACM SE 10), 2010, : 267 - 270
[36] Using a learning agent with a student model
Beck, JE
Woolf, BP
INTELLIGENT TUTORING SYSTEMS, 1998, 1452 : 6 - 15
[37] Determining the Possibility of Transfer Learning in Deep Reinforcement Learning Using Grad-CAM (Student Abstract)
Joo, Ho-Taek
Kim, Kyung-Joong
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13831 - 13832
[38] Supporting learning by opening the student model
Hartley, D
Mitrovic, A
INTELLIGENT TUTORING SYSTEMS, 2002, 2363 : 453 - 462
[39] Selecting Portfolios Directly Using Recurrent Reinforcement Learning (Student Abstract)
Li, Lin
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13857 - 13858
[40] Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)
Rozek, Brandon
Lee, Junkyu
Kokel, Harsha
Katz, Michael
Sohrabi, Shirin
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23635 - 23636

← 1 2 3 4 5 →