Robust Depth Estimation Based on Parallax Attention for Aerial Scene Perception

被引:3
|
作者
Tong, Wei [1 ,2 ]
Zhang, Miaomiao [3 ]
Zhu, Guangyu [4 ]
Xu, Xin [5 ]
Wu, Edmond Q. [3 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[4] Beijing Jiaotong Univ China, Sch Traff & Transportat, Beijing 100044, Peoples R China
[5] Natl Univ Def Technol, Coll Intelligence Sci & Technol, Changsha 410005, Peoples R China
关键词
Costs; Feature extraction; Estimation; Transformers; Task analysis; Convolution; Training; Disparity estimation; parallax attention; stereo matching; transformer;
D O I
10.1109/TII.2024.3392270
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given the precalibrated image pairs, stereo matching aims to infer the scene depth information in real-time, which has important research value in the fields of high-precision 3-D reconstruction of the Earth's surface, automatic driving and unmanned aerial vehicle (UAV) navigation. The cost volume-based stereo matching method adopts a coarse-to-fine manner to construct cascaded cost volume, and applies 3-D convolution to capture the correspondence of feature matching to infer the disparity map, which achieves comparable performance. However, the existing method has difficulty dealing with jitter regions with disparity change, and direct disparity regression easily leads to overfitting of cost volume regularization. To alleviate the above two problems, this work proposes an end-to-end disparity estimation network based on Transformer. Its specific improvements are as follows. 1) The cross-view feature interaction module based on Transformer is introduced to realize the feature interaction of global context information. 2) A parallax attention mechanism is designed to impose global geometric constraints on the epipolar line to improve the reliability of feature matching. 3) Focal loss is applied for the training of the disparity classification model to emphasize one-hot supervision in ambiguous regions. Comprehensive experiments on public datasets Sceneflow, KITTI2015, ETH3D, and aerial WHU datasets validate that the proposed work can effectively enhance the performance of disparity estimation.
引用
收藏
页码:10761 / 10769
页数:9
相关论文
共 50 条