Enhancing object pose estimation for RGB images in cluttered scenes

被引:0
|
作者
Al-Selwi, Metwalli [1 ,2 ,3 ,4 ]
Ning, Huang [3 ]
Gao, Yin [1 ,3 ,4 ]
Chao, Yan [3 ]
Li, Qiming [3 ]
Li, Jun [1 ,2 ,3 ,4 ]
机构
[1] Chinese Acad Sci, Fujian Inst Res Struct Matter, Fuzhou, Fujian, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Quanzhou Inst Equipment Mfg, Haixi Inst, Quanzhou, Fujian, Peoples R China
[4] Univ Chinese Acad Sci, Fujian Coll, Fuzhou, Fujian, Peoples R China
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
基金
中国国家自然科学基金;
关键词
6D object pose estimation; Heavy occlusion; Cluttered scenes; Convolutional neural networks; Self-attention mechanisms;
D O I
10.1038/s41598-025-90482-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Estimating the 6D pose of objects is crucial for robots to interact with the environment. 6D Object pose estimation from RGB images in a cluttered scene and heavy occlusions is a critical issue. Most existing methods use two stages to estimate object pose: First, extract the object features, and then use the PnP/RANSAC method to estimate object pose. However, most of these techniques merely localize a group of key-points by regressing their coordinates, which are vulnerable to occlusion and have poor performance for multi-object pose estimation. These methods cannot directly regress the 6D pose estimation from a loss during training. In this paper, we propose a framework based on convolutional neural network (CNN) and self-attention mechanism as an end-to-end method for single and multi-object 6D pose estimation using RGB images with low computational cost. Our method utilizes feature fusion to extract local features and combines multi-head self-attention (MHSA) with iterative refinement to improve pose estimation performance. Furthermore, our method can be scaled according to computational resources. Our experiments illustrate that our method performs in benchmark datasets the Linemod and Occlusion Linemod and achieves 97.45% and 84.84% in terms of the ADD(-S) metric in both datasets, respectively.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Fully invariant object recognition in cluttered scenes
    Bone, P
    Kypraios, I
    Young, R
    Chatwin, C
    Information Technologies 2004, 2004, 5822 : 1 - 12
  • [32] Enhancing ORB-SLAM3 Pose Estimation in Dynamic Scenes with YOLOv5 Object Detection
    Zhou, Wanzhen
    Zhang, Xiaoran
    Meng, Xi
    Wang, Shangyue
    Liu, Zhiguo
    Song, Yufei
    2024 3RD INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MEDIA COMPUTING, ICIPMC 2024, 2024, : 8 - 15
  • [33] 6DoF Object Pose and Focal Length Estimation from Single RGB Images in Uncontrolled Environments
    Manawadu, Mayura
    Park, Soon-Yong
    SENSORS, 2024, 24 (17)
  • [34] Using spin images for efficient object recognition in cluttered 3D scenes
    Johnson, AE
    Hebert, M
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1999, 21 (05) : 433 - 449
  • [35] Holistic and local patch framework for 6D object pose estimation in RGB-D images
    Zhang, Haoruo
    Cao, Qixin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 180 : 59 - 73
  • [36] Faster and Finer Pose Estimation for Object Pool in a Single RGB Image
    Aing, Lee
    Lie, Wen-Nung
    Chiang, Jui-Chiu
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [37] An RGB-D Refinement Solution for Accurate Object Pose Estimation
    Saadi, Lounes
    Besbes, Bassem
    Kramm, Sebastien
    Bensrhair, Abdelaziz
    2021 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY ADJUNCT PROCEEDINGS (ISMAR-ADJUNCT 2021), 2021, : 189 - 194
  • [38] SilhoNet: An RGB Method for 6D Object Pose Estimation
    Billings, Gideon
    Johnson-Roberson, Matthew
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (04): : 3727 - 3734
  • [39] Learning Local RGB-to-CAD Correspondences for Object Pose Estimation
    Georgakis, Georgios
    Karanam, Srikrishna
    Wu, Ziyan
    Kosecka, Jana
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8966 - 8975
  • [40] HCCG: Efficient high compatibility correspondence grouping for 3D object recognition and 6D pose estimation in cluttered scenes
    Wu, Lang
    Li, Xi
    Zhong, Kai
    Li, Zhongwei
    Wang, Congjun
    Shi, Yusheng
    MEASUREMENT, 2022, 197