A visual localization method based on encoder-decoder dual-stream CNN

被引:0
|
作者
Jia R. [1 ]
Liu S. [1 ]
Li J. [2 ]
Wang Y. [2 ]
Pan H. [2 ]
机构
[1] School of Information Science and Technology, North China University of Technology, Beijing
[2] College of Software, Beihang University, Beijing
关键词
Convolutional neural network (CNN); Dual-stream network; Encoder-decoder architecture; Skip connection; Visual localization;
D O I
10.13700/j.bh.1001-5965.2019.0046
中图分类号
学科分类号
摘要
In order to calculate the camera pose from a single RGB image, a deep encoder-decoder dual-stream convolutional neural network (CNN) is proposed, which can improve the accuracy of visual localization. The network first uses an encoder to extract advanced features from input images. Second, the spacialresolution is enhancedby a pose decoder.Finally, a multi-scale estimator is used to output pose parameters. Becauseof the differentperformance of position and orientation, the network adopts a dual-stream structure from the decoder to process the position and orientationseparately. To restore the spatial information, several skip connections are added to encoder-decoder architecture. The experimental results show that the accuracy of the network is obviously improved compared with the congener state-of-the-art algorithms, and the orientation accuracy of camera pose is improved dramatically. © 2019, Editorial Board of JBUAA. All right reserved.
引用
收藏
页码:1965 / 1972
页数:7
相关论文
共 26 条
  • [1] Chen D.M., Baatz G., Koser K., Et al., City-scale landmark identification on mobile devices, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (2011)
  • [2] Torii A., Sivic J., Pajdla T., Et al., Visual place recognition with repetitive structures, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 883-890, (2013)
  • [3] Schindler G., Brown M., Szeliski R., City-scale location recognition, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-7, (2007)
  • [4] Arth C., Pirchheim C., Ventura J., Et al., Instant outdoor localization and SLAM initialization from 2.5 D maps, IEEE Transactions on Visualization and Computer Graphics, 21, 11, pp. 1309-1318, (2015)
  • [5] Poglitsch C., Arth C., Schmalstieg D., Et al., A particle filter approach to outdoor localization using image-based rendering, IEEE International Symposium on Mixed and Augmented Reality(ISMAR), pp. 132-135, (2015)
  • [6] Sattler T., Leibe B., Kobbelt L., Improving image-based localization by active correspondence search, Proceedings of European Conference on Computer Vision, pp. 752-765, (2012)
  • [7] Li Y., Snavely N., Huttenlocher D., Et al., Worldwide pose estimation using 3D point clouds, Proceedings of European Conference on Computer Vision, pp. 15-29, (2012)
  • [8] Choudhary S., Narayanan P.J., Visibility probability structure from SFM datasets and applications, Proceedings of European Conference on Computer Vision, pp. 130-143, (2012)
  • [9] Svarm L., Enqvist O., Oskarsson M., Et al., Accurate localization and pose estimation for large 3D models, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 532-539, (2014)
  • [10] Shotton J., Glocker B., Zach C., Et al., Scene coordinate regression forests for camera relocalization in RGB-D images, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930-2937, (2013)