Conformer: Local Features Coupling Global Representations for Visual Recognition

被引:486
|
作者
Peng, Zhiliang [1 ]
Huang, Wei [1 ]
Gu, Shanzhi [3 ]
Xie, Lingxi [2 ]
Wang, Yaowei [3 ]
Jiao, Jianbin [1 ]
Ye, Qixiang [1 ,3 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] Huawei Inc, Shenzhen, Peoples R China
[3] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
SCALE;
D O I
10.1109/ICCV48922.2021.00042
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can capture long-distance feature dependencies but unfortunately deteriorate local feature details. In this paper, we propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Conformer roots in the Feature Coupling Unit (FCU), which fuses local features and global representations under different resolutions in an interactive fashion. Conformer adopts a concurrent structure so that local features and global representations are retained to the maximum extent. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet. On MSCOCO, it outperforms ResNet-101 by 3.7% and 3.6% mAPs for object detection and instance segmentation, respectively, demonstrating the great potential to be a general backbone network. Code is available at github.com/pengzhiliang/Conformer.
引用
收藏
页码:357 / 366
页数:10
相关论文
共 50 条
  • [1] Conformer: Local Features Coupling Global Representations for Recognition and Detection
    Peng, Zhiliang
    Guo, Zonghao
    Huang, Wei
    Wang, Yaowei
    Xie, Lingxi
    Jiao, Jianbin
    Tian, Qi
    Ye, Qixiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9454 - 9468
  • [2] INTEGRATION OF LOCAL AND GLOBAL FEATURES FOR FACE RECOGNITION
    Chen, Cun-Jian
    2008 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 193 - 198
  • [3] Local and global Gabor features for object recognition
    Kamarainen J.-K.
    Kyrki V.
    Kälviäinen H.
    Pattern Recognition and Image Analysis, 2007, 17 (01) : 93 - 105
  • [4] Face Recognition Using Local and Global Features
    Jian Huang
    Pong C. Yuen
    J. H. Lai
    Chun-hung Li
    EURASIP Journal on Advances in Signal Processing, 2004
  • [5] Palmprint Recognition Using Local and Global Features
    Ahmad, Muhammad Imran
    Ilyas, Mohd Zaizu
    Ngadiran, Ruzelita
    Isa, Mohd Nazrin Md
    Yaakob, Shahrul Nizam
    21ST INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2014), 2014, : 79 - 82
  • [6] Face recognition using local and global features
    Huang, J
    Yuen, PC
    Lai, JH
    Li, CH
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (04) : 530 - 541
  • [7] Face recognition fusing global and local features
    Yu, Wei-wei
    Teng, Xiao-long
    Liu, Chong-qing
    JOURNAL OF ELECTRONIC IMAGING, 2006, 15 (01)
  • [8] Integration of local features to a global percept by neural coupling
    Rose, Michael
    Sommer, Tobias
    Buechel, Christian
    CEREBRAL CORTEX, 2006, 16 (10) : 1522 - 1528
  • [10] VISUAL-SEARCH FOR GLOBAL AND LOCAL STIMULUS FEATURES
    SAARINEN, J
    PERCEPTION, 1994, 23 (02) : 237 - 243