Vision-Based Efficient Robotic Manipulation with a Dual-Streaming Compact Convolutional Transformer

被引:2
|
作者
Guo, Hao [1 ]
Song, Meichao [1 ]
Ding, Zhen [1 ]
Yi, Chunzhi [2 ]
Jiang, Feng [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Inst Technol, Sch Med & Hlth, Harbin 150001, Peoples R China
关键词
bio-inspired design and control of robots; robotics; reinforcement learning; vision transformer; LEVEL;
D O I
10.3390/s23010515
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Learning from visual observation for efficient robotic manipulation is a hitherto significant challenge in Reinforcement Learning (RL). Although the collocation of RL policies and convolution neural network (CNN) visual encoder achieves high efficiency and success rate, the method general performance for multi-tasks is still limited to the efficacy of the encoder. Meanwhile, the increasing cost of the encoder optimization for general performance could debilitate the efficiency advantage of the original policy. Building on the attention mechanism, we design a robotic manipulation method that significantly improves the policy general performance among multitasks with the lite Transformer based visual encoder, unsupervised learning, and data augmentation. The encoder of our method could achieve the performance of the original Transformer with much less data, ensuring efficiency in the training process and intensifying the general multi-task performances. Furthermore, we experimentally demonstrate that the master view outperforms the other alternative third-person views in the general robotic manipulation tasks when combining the third-person and egocentric views to assimilate global and local visual information. After extensively experimenting with the tasks from the OpenAI Gym Fetch environment, especially in the Push task, our method succeeds in 92% versus baselines that of 65%, 78% for the CNN encoder, 81% for the ViT encoder, and with fewer training steps.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Vision-Based Robotic Graphic Programming System
    Mao, Jianfei
    Liang, Ronghua
    Mao, Keji
    Tian, Qing
    TRANSACTIONS ON EDUTAINMENT V, 2011, 6530 : 80 - 89
  • [42] Efficient matching of Transformer-enhanced features for accurate vision-based displacement measurement
    Zhang, Haoyu
    Wu, Stephen
    Luo, Xiangyun
    Huang, Yong
    Li, Hui
    AUTOMATION IN CONSTRUCTION, 2025, 171
  • [43] Vision-based Motion Control for Robotic Systems
    Oda, Naoki
    Ito, Masabide
    Shibata, Masaaki
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2009, 4 (02) : 176 - 183
  • [44] Streaming Image Sequences for Vision-Based Mobile Robots
    Pinto, Andry Maykol
    Moreira, Antonio Paulo
    Costa, Paulo Gomes
    CONTROLO'2014 - PROCEEDINGS OF THE 11TH PORTUGUESE CONFERENCE ON AUTOMATIC CONTROL, 2015, 321 : 637 - 646
  • [45] Integrated vision-based system for efficient, semi-automated control of a robotic manipulator
    Jiang, Hairong
    Wachs, Juan P.
    Duerstock, Bradley S.
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2014, 7 (03) : 253 - 266
  • [46] Vision-Based In-Hand Manipulation for Variously Shaped and Sized Objects by a Robotic Gripper With Active Surfaces
    Isobe, Yuzuka
    Kang, Sunhwi
    Shimamoto, Takeshi
    Matsuyama, Yoshinari
    Pathak, Sarthak
    Umeda, Kazunori
    IEEE ACCESS, 2023, 11 : 127317 - 127333
  • [47] A Vision-Based Measurement Algorithm For Micro/Nano Manipulation
    Clark, Leon
    Shirinzadeh, Bijan
    Bhagat, Umesh
    Smith, Julian
    2013 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM): MECHATRONICS FOR HUMAN WELLBEING, 2013, : 100 - 105
  • [48] VIBI: Assistive Vision-Based Interface for Robot Manipulation
    Quintero, Camilo Perez
    Ramirez, Oscar
    Jaegersand, Martin
    2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, : 4458 - 4463
  • [49] Accurate Vision-based Manipulation through Contact Reasoning
    Kloss, Alina
    Bauza, Maria
    Wu, Jiajun
    Tenenbaum, Joshua B.
    Rodriguez, Alberto
    Bohg, Jeannette
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 6738 - 6744
  • [50] On the Design and Development of Vision-Based Autonomous Mobile Manipulation
    Islam, Shafiqul
    Dias, Jorge
    Sunda-Meya, Anderson
    IECON 2021 - 47TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2021,