A complementary dual-backbone transformer extracting and fusing weak cues for object detection in extremely dark videos

被引：6

作者：

Zhang, Bo ^{[1
]}

Suo, Jinli ^{[1
,2
,3
]}

Dai, Qionghai ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Inst Brain & Cognit Sci, Beijing 100084, Peoples R China

[3] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China

来源：

INFORMATION FUSION | 2023年 / 97卷

基金：

中国国家自然科学基金; 北京市自然科学基金;

关键词：

Low-light video; Object detection; Transformer; Feature aggregation; Feature fusion;

D O I：

10.1016/j.inffus.2023.101822

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reliable object detection under dark environment is of wide applications but severely challenged by heavy noise washing out informative features and uneven radiance caused by nighttime illuminations. These unique features of dark videos would largely degenerate the performance of existing detectors. To address this issue, specially designed algorithms being able to extract and fuse the weak features buried in the low-quality videos are of vital importance. Bearing these in mind, we propose illumination-aware spatio-temporal feature fusion modules for low-light video object detection and implement a Dark Video Detector under a TRansformer network structure, dubbed as DVD-TR. Firstly, we use a dual-backbone Transformer to extract separate complementary features and fuse them to strengthen the network's feature extraction capability. Secondly, we incorporate a spatio-temporal sampling mechanism to aggregate features from multiple frames, which can enhance detection accuracy in dark videos. Thirdly, we use a small encoder-decoder network to obtain irradiance distribution which is further incorporated for illumination-aware feature fusion. Extensive experiments on large-scale multi-illuminance dark video benchmark show that DVD-TR outperforms state-of-the-art video detectors by a large margin and validate the effectiveness of the proposed approach.

引用

页数：12