RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation

被引:0
|
作者
An, Boshi [1 ,2 ]
Geng, Yiran [1 ,2 ]
Chen, Kai [4 ]
Li, Xiaoqi [1 ,2 ,3 ]
Dou, Qi [4 ]
Dong, Hao [1 ,2 ]
机构
[1] Peking Univ, Sch CS, Hyperplane Lab, Beijing, Peoples R China
[2] Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
[3] Beijing Acad Artificial Intelligence BAAI, Beijing, Peoples R China
[4] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICRA57147.2024.10610690
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Robotic manipulation requires accurate perception of the environment, which poses a significant challenge due to its inherent complexity and constantly changing nature. In this context, RGB image and point-cloud observations are two commonly used modalities in visual-based robotic manipulation, but each of these modalities have their own limitations. Commercial point-cloud observations often suffer from issues like sparse sampling and noisy output due to the limits of the emission-reception imaging principle. On the other hand, RGB images, while rich in texture information, lack essential depth and 3D information crucial for robotic manipulation. To mitigate these challenges, we propose an image-only robotic manipulation framework that leverages an eye-on-hand monocular camera installed on the robot's parallel gripper. By moving with the robot gripper, this camera gains the ability to actively perceive the object from multiple perspectives during the manipulation process. This enables the estimation of 6D object poses, which can be utilized for manipulation. While, obtaining images from more and diverse viewpoints typically improves pose estimation, it also increases the manipulation time. To address this trade-off, we employ a reinforcement learning policy to synchronize the manipulation strategy with active perception, achieving a balance between 6D pose accuracy and manipulation efficiency. Our experimental results in both simulated and real-world environments showcase the state-of-the-art effectiveness of our approach. We believe that our method will inspire further research on real-world-oriented robotic manipulation. See https://rgbmanip.github.io/for more details.
引用
收藏
页码:7748 / 7755
页数:8
相关论文
共 50 条
  • [41] Monocular-Based Pose Estimation Based on Fiducial Markers for Space Robotic Capture Operations in GEO
    Opromolla, Roberto
    Vela, Claudio
    Nocerino, Alessia
    Lombardi, Carlo
    REMOTE SENSING, 2022, 14 (18)
  • [42] Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image
    Fan, Zhaoxin
    Song, Zhenbo
    Xu, Jian
    Wang, Zhicheng
    Wu, Kejian
    Liu, Hongyan
    He, Jun
    COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 220 - 236
  • [43] A Unified Framework for Depth-Assisted Monocular Object Pose Estimation
    Hoang, Dinh-Cuong
    Tan, Phan Xuan
    Nguyen, Thu-Uyen
    Pham, Hai-Nam
    Nguyen, Chi-Minh
    Bui, Son-Anh
    Duong, Quang-Tri
    Vu, van-Duc
    Nguyen, van-Thiep
    Duong, van-Hiep
    Hoang, Ngoc-Anh
    Phan, Khanh-Toan
    Tran, Duc-Thanh
    Ho, Ngoc-Trung
    Tran, Cong-Trinh
    IEEE ACCESS, 2024, 12 : 111723 - 111740
  • [44] 3-D head pose estimation for monocular image
    Pan, YJ
    Zhu, H
    Ji, RR
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 293 - 301
  • [45] Challenges for Monocular 6-D Object Pose Estimation in Robotics
    Thalhammer, Stefan
    Bauer, Dominik
    Hoenig, Peter
    Weibel, Jean-Baptiste
    Garcia-Rodriguez, Jose
    Vincze, Markus
    IEEE TRANSACTIONS ON ROBOTICS, 2024, 40 : 4065 - 4084
  • [46] A survey on joint object detection and pose estimation using monocular vision
    Patil, Aniruddha V.
    Rabha, Pankaj
    2018 INTERNATIONAL JOINT CONFERENCE ON METALLURGICAL AND MATERIALS ENGINEERING (JCMME 2018), 2019, 277
  • [47] Image-Based Pose Estimation Using a Compact 3D Model
    Heisterklaus, Iris
    Qian, Ningqing
    Miller, Artur
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS BERLIN (ICCE-BERLIN), 2014, : 327 - 330
  • [48] Image-based object editing
    Rushmeier, H
    Gomes, J
    Balmelli, L
    Bernardini, F
    Taubin, G
    FOURTH INTERNATIONAL CONFERENCE ON 3-D DIGITAL IMAGING AND MODELING, PROCEEDINGS, 2003, : 20 - 27
  • [49] RGB-D Image-based Pose Estimation with Monte Carlo Localization
    Li, Ming
    Qin, Hao
    Huang, May
    Cao, Jian
    Zhang, Xing
    2017 3RD INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2017, : 109 - 114
  • [50] Image-Based Tactile Deformation Simulation and Pose Estimation for Robot Skill Learning
    Fu, Chenfeng
    Li, Longnan
    Gao, Yuan
    Wan, Weiwei
    Harada, Kensuke
    Lu, Zhenyu
    Yang, Chenguang
    APPLIED SCIENCES-BASEL, 2025, 15 (03):