Conformer: Local Features Coupling Global Representations for Visual Recognition

被引：486

作者：

Peng, Zhiliang ^{[1
]}

Huang, Wei ^{[1
]}

Gu, Shanzhi ^{[3
]}

Xie, Lingxi ^{[2
]}

Wang, Yaowei ^{[3
]}

Jiao, Jianbin ^{[1
]}

Ye, Qixiang ^{[1
,3
]}

机构：

[1] Univ Chinese Acad Sci, Beijing, Peoples R China

[2] Huawei Inc, Shenzhen, Peoples R China

[3] Peng Cheng Lab, Shenzhen, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

SCALE;

D O I：

10.1109/ICCV48922.2021.00042

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network. Code is available at github.com/pengzhiliang/Conformer.

引用

页码：357 / 366

页数：10

共 50 条

[31] Learning Visual Object Categories with Global Descriptors and Local Features
Pereira, Rui
Lopes, Luis Seabra
PROGRESS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5816 : 225 - 236
[32] Binding global and local object features in visual working memory
Ericson, Justin M.
Beck, Melissa R.
van Lamsweerde, Amanda E.
ATTENTION PERCEPTION & PSYCHOPHYSICS, 2016, 78 (01) : 94 - 106
[33] Binding global and local object features in visual working memory
Justin M. Ericson
Melissa R. Beck
Amanda E. van Lamsweerde
Attention, Perception, & Psychophysics, 2016, 78 : 94 - 106
[34] Interpreting local visual features as a global shape requires awareness
Schwarzkopf, D. Samuel
Rees, Geraint
PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2011, 278 (1715) : 2207 - 2215
[35] A Combined Visual Tracker based on Global Appearance and Local Features
Yang, Tianyang
Jin, Lizuo
Li, Yawei
Cui, Tong
2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2016, : 602 - 607
[36] IMPROVING FACE RECOGNITION USING COMBINATION OF GLOBAL AND LOCAL FEATURES
Nor'aini, A. J.
Raveendran, P.
2009 6TH INTERNATIONAL SYMPOSIUM ON MECHATRONICS AND ITS APPLICATIONS (ISMA), 2009, : 433 - +
[37] Global and Local Features Based Topic Model for Scene Recognition
Li, Heping
Wang, Fangyuan
Zhang, Shuwu
2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 532 - 537
[38] Facial Expression Recognition from Global and a Combination of Local Features
Praseeda, Lekshmi V.
Sasikumar, M.
IETE TECHNICAL REVIEW, 2009, 26 (01) : 41 - 46
[39] Face recognition using most discriminative local and global features
Gao, Yong
Wang, Yangsheng
Feng, Xuetao
Zhou, Xiaoxu
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 351 - +
[40] Gender Recognition Using Fusion of Local and Global Facial Features
Mirza, Anwar M.
Hussain, Muhammad
Almuzaini, Huda
Muhammad, Ghulam
Aboalsamh, Hatim
Bebis, George
ADVANCES IN VISUAL COMPUTING, PT II, 2013, 8034 : 493 - 502

← 1 2 3 4 5 →