DKTNet: Dual-Key Transformer Network for small object detection

被引：28

作者：

Xu, Shoukun ^{[1
]}

Gu, Jianan ^{[1
]}

Hua, Yining ^{[2
]}

Liu, Yi ^{[1
]}

机构：

[1] Changzhou Univ, Changzhou 213164, Jiangsu, Peoples R China

[2] Univ Aberdeen, Aberdeen, Scotland

来源：

NEUROCOMPUTING | 2023年 / 525卷

基金：

中国国家自然科学基金;

关键词：

Small object detection; Transformer; Dual-key;

D O I：

10.1016/j.neucom.2023.01.055

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Object detection is a fundamental computer vision task that plays a crucial role in a wide range of real-world applications. However, it is still a challenging task to detect the small size objects in the complex scene, due to the low resolution and noisy representation appearance caused by occlusion, distant depth view, etc. To tackle this issue, a novel transformer architecture, Dual-Key Transformer Network (DKTNet), is proposed in this paper. To improve the feature attention ability, the coherence of linear layer outputs Q and V are enhanced by a dual-K integrated from K1 and K2, which are computed along Q and V, respectively. Instead of spatial-wise attention, channel-wise self-attention mechanism is adopted to promote the important feature channels and suppress the confusing ones. Moreover, 2D and 1D convolution computations for Q, K and V are proposed. Compared with the fully-connected computa-tion in conventional transformer architectures, the 2D convolution can better capture local details and global contextual information, and the 1D convolution can reduce network complexity significantly. Experimental evaluation is conducted on both general and small object detection datasets. The superior-ity of the aforementioned features in our proposed approach is demonstrated with the comparison against the state-of-the-art approaches.(c) 2023 Elsevier B.V. All rights reserved.

引用

页码：29 / 41

页数：13

共 50 条

[1] Dual-key strategy
Toshiki Itoh
Pietro De Camilli
Nature, 2004, 429 : 141 - 143
[2] Membrane trafficking - Dual-key strategy
Itoh, T
De Camilli, P
NATURE, 2004, 429 (6988) : 141 - 143
[3] Interactive Transformer for Small Object Detection
Wei, Jian
Wang, Qinzhao
Zhao, Zixu
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (02): : 1699 - 1717
[4] Dual-Key Multimodal Backdoors for Visual Question Answering
Walmer, Matthew
Sikka, Karan
Sur, Indranil
Shrivastava, Abhinav
Jha, Susmit
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15354 - 15364
[5] Small Object Detection for Birds with Swin Transformer
Huo, Da
Kastner, Marc A.
Liu, Tingwei
Kawanishi, Yasutomo
Hirayama, Takatsugu
Komamizu, Takahiro
Ide, Ichiro
2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
[6] Dual-Key Multimodal Backdoors for Visual Question Answering
Walmer, Matthew
Sikka, Karan
Sur, Indranil
Shrivastava, Abhinav
Jha, Susmit
arXiv, 2021,
[7] A Key Distribution Scheme for WSN Based on Deployment Knowledge and Dual-Key Pools
You, Lin
Yuan, Younan
INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2014, 8 (04): : 1 - 15
[8] Transformer-CNN for small image object detection
Chen, Yan-Lin
Lin, Chun-Liang
Lin, Yu-Chen
Chen, Tzu-Chun
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 129
[9] Collaborative compensative transformer network for salient object detection
Chen, Jun
Zhang, Heye
Gong, Mingming
Gao, Zhifan
PATTERN RECOGNITION, 2024, 154
[10] TNOD: Transformer Network with Object Detection for Tag Recommendation
Feng, Kai
Liu, Tao
Zhang, Heng
Meng, Zihao
Miao, Zemin
PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 617 - 621

← 1 2 3 4 5 →