RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation

被引:0
|
作者
An, Boshi [1 ,2 ]
Geng, Yiran [1 ,2 ]
Chen, Kai [4 ]
Li, Xiaoqi [1 ,2 ,3 ]
Dou, Qi [4 ]
Dong, Hao [1 ,2 ]
机构
[1] Peking Univ, Sch CS, Hyperplane Lab, Beijing, Peoples R China
[2] Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
[3] Beijing Acad Artificial Intelligence BAAI, Beijing, Peoples R China
[4] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICRA57147.2024.10610690
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robotic manipulation requires accurate perception of the environment, which poses a significant challenge due to its inherent complexity and constantly changing nature. In this context, RGB image and point-cloud observations are two commonly used modalities in visual-based robotic manipulation, but each of these modalities have their own limitations. Commercial point-cloud observations often suffer from issues like sparse sampling and noisy output due to the limits of the emission-reception imaging principle. On the other hand, RGB images, while rich in texture information, lack essential depth and 3D information crucial for robotic manipulation. To mitigate these challenges, we propose an image-only robotic manipulation framework that leverages an eye-on-hand monocular camera installed on the robot's parallel gripper. By moving with the robot gripper, this camera gains the ability to actively perceive the object from multiple perspectives during the manipulation process. This enables the estimation of 6D object poses, which can be utilized for manipulation. While, obtaining images from more and diverse viewpoints typically improves pose estimation, it also increases the manipulation time. To address this trade-off, we employ a reinforcement learning policy to synchronize the manipulation strategy with active perception, achieving a balance between 6D pose accuracy and manipulation efficiency. Our experimental results in both simulated and real-world environments showcase the state-of-the-art effectiveness of our approach. We believe that our method will inspire further research on real-world-oriented robotic manipulation. See https://rgbmanip.github.io/for more details.
引用
收藏
页码:7748 / 7755
页数:8
相关论文
共 50 条
  • [21] Image-based aircraft pose estimation using moment invariants
    Breuers, MG
    AUTOMATIC TARGET RECOGNITION IX, 1999, 3718 : 294 - 304
  • [22] A critical analysis of image-based camera pose estimation techniques
    Xu, Meng
    Wang, Youchen
    Xu, Bin
    Zhang, Jun
    Ren, Jian
    Huang, Zhao
    Poslad, Stefan
    Xu, Pengfei
    NEUROCOMPUTING, 2024, 570
  • [23] Brain Mechanisms for Robotic Object Pose Estimation
    Chinellato, Eris
    Grzyb, Beata J.
    del Pobil, Angel P.
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 3268 - 3275
  • [24] A Simple Image-based Object Velocity Estimation Approach
    Chu, Hung-Chi
    Yang, Hao
    2014 IEEE 11TH INTERNATIONAL CONFERENCE ON NETWORKING, SENSING AND CONTROL (ICNSC), 2014, : 102 - 107
  • [25] Object Detection and 6D Pose Estimation for Precise Robotic Manipulation in Unstructured Environments
    di Castro, Mario
    Camarero Vera, Jorge
    Ferre, Manuel
    Masi, Alessandro
    INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, ICINCO 2017, 2020, 495 : 392 - 403
  • [26] Object Pose Estimation from Monocular Image Using Multi-view Keypoint Correspondence
    Kundu, Jogendra Nath
    Rahul, M., V
    Ganeshan, Aditya
    Babu, R. Venkatesh
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT III, 2019, 11131 : 298 - 313
  • [27] Image-based UAV position and velocity estimation using a monocular camera
    Nabavi-Chashmi, Seyed-Yaser
    Asadi, Davood
    Ahmadi, Karim
    CONTROL ENGINEERING PRACTICE, 2023, 134
  • [28] Monocular Image-based Intruder Direction Estimation at Closest Point of Approach
    Bauer, Peter
    Hiba, Antal
    Bokor, Jozsef
    2017 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS'17), 2017, : 1108 - 1117
  • [29] Monocular Image-based Time to Collision and Closest Point of Approach Estimation
    Bauer, Peter
    Hiba, Antal
    Vanek, Balint
    Zarandy, Akos
    Bokor, Jozsef
    2016 24TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2016, : 1168 - 1173
  • [30] HUMAN POSE ESTIMATION FROM MONOCULAR IMAGE CAPTURES
    Lin, Huei-Yung
    Chen, Ting-Wen
    Chen, Chih-Chang
    Hsieh, Chia-Hao
    Lie, Wen-Nung
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 994 - +