Multi-Modal Imitation Learning Method with Cosine Similarity

被引：0

作者：

Hao S. ^{[1
]}

Liu Q. ^{[1
,2
,3
,4
]}

Xu P. ^{[1
]}

Zhang L. ^{[1
]}

Huang Z. ^{[1
]}

机构：

[1] School of Computer Science and Technology, Soochow University, Jiangsu, Suzhou

[2] Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing university, Nanjing

[3] Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, Ministry of Education, Changchun

[4] Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Jiangsu, Suzhou

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2023年 / 60卷 / 06期

基金：

中国国家自然科学基金;

关键词：

cosine similarity; generative adversarial imitation learning; inverse reinforcement learning; mode collapse; multi-modal;

D O I：

10.7544/issn1000-1239.202220119

中图分类号：

学科分类号：

摘要：

Generative adversarial imitation learning is an inverse reinforcement learning (IRL) method based on generative adversarial framework to imitate expert policies from expert demonstrations. In practical tasks, expert demonstrations are often generated from multi-modal policies. However, most of the existing generative adversarial imitation learning (GAIL) methods assume that the expert demonstrations are generated from a single modal policy, which leads to the mode collapse problem where the generative adversarial imitation learning can only partially learn the modal policies. Therefore, the application of the method is greatly limited for multi-modal tasks. To address the mode collapse problem, we propose the multi-modal imitation learning method with cosine similarity (MCS-GAIL). The method introduces an encoder and a policy’s group, extracts the modal features of the expert demonstrations by the encoder, calculates the cosine similarity of the features between the sample of policy sampling and the expert demonstrations, and adds them to the loss function of the policy’s group to help the policy’s group learn the expert policies of the corresponding modalities. In addition, MCS-GAIL uses a new min-max game formulation for the policy’s group to learn different modal policies in a complementary way. Under the assumptions, we prove the convergence of MCS-GAIL by theoretical analysis. To verify the effectiveness of the method, MCS-GAIL is implemented on the Grid World and MuJoCo platforms and compared with the existing mode collapse methods. The experimental results show that MCS-GAIL can effectively learn multiple modal policies in all environments with high accuracy and stability. © 2023 Science Press. All rights reserved.

引用

页码：1358 / 1372

页数：14

共 28 条

[1] Osa T, Pajarinen J, Neumann G, Et al., An algorithmic perspective on imitation learning[J], Foundations and Trends in Robotics, 7, pp. 1-179, (2018)
[2] Kaifeng Zhang, Yang Yu, Methodologies for imitation learning via inverse reinforcement learning: A review[J], Journal of Computer Research and Development, 56, 2, (2019)
[3] Maeda G J, Neumann G, Ewerton M, Et al., Probabilistic movement primitives for coordination of multiple human –robot collaborative tasks[J], Autonomous Robots, 41, 3, pp. 593-612, (2017)
[4] Ng A Y, Russell S., Algorithms for inverse reinforcement learning [C], Proc of the 17th Int Conf on Machine Learning, pp. 663-670, (2000)
[5] Arora S, Doshi P., A survey of inverse reinforcement learning: Challenges, methods and progress, Artificial Intelligence, 297, (2021)
[6] Ho J, Ermon S., Generative adversarial imitation learning [C], Advances in Neural Information Processing Systems, 29, pp. 4565-4573, (2016)
[7] Goodfellow I, Pouget-Abadie J, Mirza M, Et al., Generative adversarial nets [C], Advances in Neural Information Processing Systems 27, pp. 2672-2680, (2014)
[8] Jiahao Lin, Zongzhang Zhang, Chong Jiang, Et al., A survey of imitation learning based on generative adversarial nets[J], Chinese Journal of Computers, 43, 2, (2020)
[9] Xin Zhang, Yanhua Li, Zhang Ziming, Et al., f-gail: Learning fdivergence for generative adversarial imitation learning [C], Advances in Neural Information Processing Systems, 33, pp. 12805-12815, (2020)
[10] Ming Zhang, Wang Yawei, Ma Xiaoteng, Et al., Wasserstein distance guided adversarial imitation learning with reward shape exploration [C], Proc of the 9th IEEE Data Driven Control and Learning Systems Conf, pp. 1165-1170, (2020)

← 1 2 3 →