A survey: object detection methods from CNN to transformer

被引：52

作者：

Arkin, Ershat ^{[1
]}

Yadikar, Nurbiya ^{[1
]}

Xu, Xuebin ^{[1
]}

Aysa, Alimjan ^{[2
]}

Ubul, Kurban ^{[1
,2
]}

机构：

[1] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China

[2] Xinjiang Univ, Key Lab Multilingual Informat Technol, Urumqi 830046, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 14期

基金：

美国国家科学基金会;

关键词：

Computer vision; Object detection; Real-time system; CNN; Transformer; NETWORKS;

D O I：

10.1007/s11042-022-13801-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Object detection is the most important problem in computer vision tasks. After AlexNet proposed, based on Convolutional Neural Network (CNN) methods have become mainstream in the computer vision field, many researches on neural networks and different transformations of algorithm structures have appeared. In order to achieve fast and accurate detection effects, it is necessary to jump out of the existing CNN framework and has great challenges. Transformer's relatively mature theoretical support and technological development in the field of Natural Language Processing have brought it into the researcher's sight, and it has been proved that Transformer's method can be used for computer vision tasks, and proved that it exceeds the existing CNN method in some tasks. In order to enable more researchers to better understand the development process of object detection methods, existing methods, different frameworks, challenging problems and development trends, paper introduced historical classic methods of object detection used CNN, discusses the highlights, advantages and disadvantages of these algorithms. By consulting a large amount of paper, the paper compared different CNN detection methods and Transformer detection methods. Vertically under fair conditions, 13 different detection methods that have a broad impact on the field and are the most mainstream and promising are selected for comparison. The comparative data gives us confidence in the development of Transformer and the convergence between different methods. It also presents the recent innovative approaches to using Transformer in computer vision tasks. In the end, the challenges, opportunities and future prospects of this field are summarized.

引用

页码：21353 / 21383

页数：31

共 50 条

[41] Survey and systematization of 3D object detection models and methods
Drobnitzky, Moritz
Friederich, Jonas
Egger, Bernhard
Zschech, Patrick
VISUAL COMPUTER, 2024, 40 (03): : 1867 - 1913
[42] Recurrent Scale Approximation for Object Detection in CNN
Liu, Yu
Li, Hongyang
Yan, Junjie
Wei, Fangyin
Wang, Xiaogang
Tang, Xiaoou
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 571 - 579
[43] Oriented R-CNN for Object Detection
Xie, Xingxing
Cheng, Gong
Wang, Jiabao
Yao, Xiwen
Han, Junwei
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3500 - 3509
[44] Searching ROI for Object Detection based on CNN
Wu, Chia-Lin
Lin, Chih-Yang
Hirunsirisombut, Phanuvich
Ng, Hui-Fuang
Shih, Timothy K.
2019 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS), 2019,
[45] R-CNN for Small Object Detection
Chen, Chenyi
Liu, Ming-Yu
Tuzel, Oncel
Xiao, Jianxiong
COMPUTER VISION - ACCV 2016, PT V, 2017, 10115 : 214 - 230
[46] An Ensemble Method of CNN Models for Object Detection
Lee, Jinsu
Lee, Sang-Kwang
Yang, Seong-Il
2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 898 - 901
[47] A survey of the vision transformers and their CNN-transformer based variants
Khan, Asifullah
Raufu, Zunaira
Sohail, Anabia
Khan, Abdul Rehman
Asif, Hifsa
Asif, Aqsa
Farooq, Umair
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL3) : S2917 - S2970
[48] Malicious DNS detection by combining improved transformer and CNN
Li, Heyu
Li, Zhangmeizhi
Zhang, Shuyan
Pu, Xiao
SCIENTIFIC REPORTS, 2024, 14 (01):
[49] A survey of the vision transformers and their CNN-transformer based variants
Asifullah Khan
Zunaira Rauf
Anabia Sohail
Abdul Rehman Khan
Hifsa Asif
Aqsa Asif
Umair Farooq
Artificial Intelligence Review, 2023, 56 : 2917 - 2970
[50] Micro-YOLO: Exploring Efficient Methods to Compress CNN based Object Detection Model
Hu, Lining
Li, Yongfu
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 151 - 158

← 1 2 3 4 5 →