Policy Learning with an Effcient Black-Box Optimization Algorithm

被引：1

作者：

Hwangbo, Jemin ^{[1
]}

Gehring, Christian ^{[1
]}

Sommer, Hannes ^{[1
]}

Siegwart, Roland ^{[1
]}

Buchli, Jonas ^{[2
]}

机构：

[1] Swiss Fed Inst Technol, Autonomous Syst Lab, Zurich, Switzerland

[2] Swiss Fed Inst Technol, Agile & Dexterous Robot Lab, Zurich, Switzerland

来源：

INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS | 2015年 / 12卷 / 03期

基金：

瑞士国家科学基金会;

关键词：

Policy optimization; robotic learning; black-box optimization;

D O I：

10.1142/S0219843615500292

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Robotic learning on real hardware requires an efficient algorithm which minimizes the number of trials needed to learn an optimal policy. Prolonged use of hardware causes wear and tear on the system and demands more attention from an operator. To this end, we present a novel black-box optimization algorithm, Reward Optimization with Compact Kernels and fast natural gradient regression (ROCK*). Our algorithm immediately updates knowledge after a single trial and is able to extrapolate in a controlled manner. These features make fast and safe learning on real hardware possible. The performance of our method is evaluated with standard benchmark functions that are commonly used to test optimization algorithms. We also present three differerent robotic optimization examples using ROCK*. The first robotic example is on a simulated robot arm, the second is on a real articulated legged system, and the third is on a simulated quadruped robot with 12 actuated joints. ROCK* outperforms the current state-of-the- art algorithms in all tasks sometimes even by an order of magnitude.

引用

页数：20

共 50 条

[31] Safe non-smooth black-box optimization with application to policy search
Usmanova, Ilnura
Krause, Andreas
Kamgarpour, Maryam
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 980 - 989
[32] Black-Box Policy Search with Probabilistic Programs
van de Meent, Jan-Willem
Paige, Brooks
Tolpin, David
Wood, Frank
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1195 - 1204
[33] Combining a local search and Grover's algorithm in black-box global optimization
Bulger, D. W.
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2007, 133 (03) : 289 - 301
[34] A Stochastic Adaptive Radial Basis Function Algorithm for Costly Black-Box Optimization
Zhou Z.
Bai F.-S.
Journal of the Operations Research Society of China, 2018, 6 (4) : 587 - 609
[35] Implementation of a black-box global optimization algorithm with a parallel branch and bound template
Ciegis, Raimondas
Baravykaite, Milda
APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2007, 4699 : 1115 - +
[36] Black-Box Optimization Benchmarking of Two Variants of the POEMS Algorithm on the Noiseless Testbed
Kubalik, Jiri R.
GECCO-2010 COMPANION PUBLICATION: PROCEEDINGS OF THE 12TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2010, : 1567 - 1573
[37] An adaptive radial basis algorithm (ARBF) for expensive black-box global optimization
Kenneth Holmström
Journal of Global Optimization, 2008, 41 : 447 - 464
[38] An adaptive radial basis algorithm (ARBF) for expensive black-box global optimization
Holmstrom, Kenneth
JOURNAL OF GLOBAL OPTIMIZATION, 2008, 41 (03) : 447 - 464
[39] Combining a Local Search and Grover’s Algorithm in Black-Box Global Optimization
D. W. Bulger
Journal of Optimization Theory and Applications, 2007, 133 : 289 - 301
[40] Algorithm selection for black-box continuous optimization problems: A survey on methods and challenges
Munoz, Mario A.
Sun, Yuan
Kirley, Michael
Halgamuge, Saman K.
INFORMATION SCIENCES, 2015, 317 : 224 - 245

← 1 2 3 4 5 →