Multi-Modal Imitation Learning Method with Cosine Similarity

被引：0

作者：

Hao S. ^{[1
]}

Liu Q. ^{[1
,2
,3
,4
]}

Xu P. ^{[1
]}

Zhang L. ^{[1
]}

Huang Z. ^{[1
]}

机构：

[1] School of Computer Science and Technology, Soochow University, Jiangsu, Suzhou

[2] Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing university, Nanjing

[3] Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Ministry of Education, Changchun

[4] Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Jiangsu, Suzhou

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2023年 / 60卷 / 06期

基金：

中国国家自然科学基金;

关键词：

cosine similarity; generative adversarial imitation learning; inverse reinforcement learning; mode collapse; multi-modal;

D O I：

10.7544/issn1000-1239.202220119

中图分类号：

学科分类号：

摘要：

Generative adversarial imitation learning is an inverse reinforcement learning (IRL) method based on generative adversarial framework to imitate expert policies from expert demonstrations. In practical tasks, expert demonstrations are often generated from multi-modal policies. However, most of the existing generative adversarial imitation learning (GAIL) methods assume that the expert demonstrations are generated from a single modal policy, which leads to the mode collapse problem where the generative adversarial imitation learning can only partially learn the modal policies. Therefore, the application of the method is greatly limited for multi-modal tasks. To address the mode collapse problem, we propose the multi-modal imitation learning method with cosine similarity (MCS-GAIL). The method introduces an encoder and a policy’s group, extracts the modal features of the expert demonstrations by the encoder, calculates the cosine similarity of the features between the sample of policy sampling and the expert demonstrations, and adds them to the loss function of the policy’s group to help the policy’s group learn the expert policies of the corresponding modalities. In addition, MCS-GAIL uses a new min-max game formulation for the policy’s group to learn different modal policies in a complementary way. Under the assumptions, we prove the convergence of MCS-GAIL by theoretical analysis. To verify the effectiveness of the method, MCS-GAIL is implemented on the Grid World and MuJoCo platforms and compared with the existing mode collapse methods. The experimental results show that MCS-GAIL can effectively learn multiple modal policies in all environments with high accuracy and stability. © 2023 Science Press. All rights reserved.

引用

页码：1358 / 1372

页数：14

共 28 条

[21] Schulman J, Moritz P, Levine S, Et al., High-dimensional continuous control using generalized advantage estimation [J], (2015)
[22] Cong Fei, Wang Bin, Yuzheng Zhuang, Et al., Triple-gail: A multi-modal imitation learning framework with generative adversarial nets [C], Proc of the 29th Int Joint Conf on Artificial Intelligence, pp. 2929-2935, (2020)
[23] Sion M., On general minimax theorems[J], Pacific Journal of Mathematics, 8, 1, pp. 171-176, (1958)
[24] Todorov E, Erez T, Tassa Y., MuJoCo: A physics engine for model-based control [C], Proc of the 2012 IEEE/RSJ Int Conf on Intelligent Robots and Systems, pp. 5026-5033, (2012)
[25] Jjiacheng Zhu, Chong Jiang, Tac-gail: A multi-modal imitation learning method [C], Proc of the 27th Int Conf on Neural Information Processing, pp. 688-699, (2020)
[26] Haarnoja T, Zhou A, Abbeel P, Et al., Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor [C], Proc of the 35th Int Conf on Machine Learning, pp. 1861-1870, (2018)
[27] Schulman J, Levine S, Abbeel P, Et al., Trust region policy optimization [C], Proc of the 32nd Int Conf on Machine Learning, pp. 1889-1897, (2015)
[28] Hongwei Tan, Linyong Zhou, Guodong Wang, Et al., Instability analysis for generative adversarial networks and its solving techniques[J], SCIENTIA SINICA Informationis, 51, 4, (2021)

← 1 2 3 →