Multi-channel and multi-scale mid-level image representation for scene classification

被引:7
|
作者
Yang, Jinfu [1 ]
Yang, Fei [1 ]
Wang, Guanghui [2 ]
Li, Mingai [1 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[2] Univ Kansas, Dept Elect Engn & Comp Sci, Lawrence, KS 66045 USA
基金
中国国家自然科学基金;
关键词
scene classification; convolutional neural network; multi-channel; mid-level representation; FEATURES;
D O I
10.1117/1.JEI.26.2.023018
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Convolutional neural network (CNN)-based approaches have received state-of-the-art results in scene classification. Features from the output of fully connected (FC) layers express one-dimensional semantic information but lose the detailed information of objects and the spatial information of scene categories. On the contrary, deep convolutional features have been proved to be more suitable for describing an object itself and the spatial relations among objects in an image. In addition, the feature map from each layer is max-pooled within local neighborhoods, which weakens the invariance of global consistency and is unfavorable to scenes with highly complicated variation. To cope with the above issues, an orderless multi-channel mid-level image representation on pre-trained CNN features is proposed to improve the classification performance. The mid-level image representation of two channels from the FC layer and the deep convolutional layer are integrated at multi-scale levels. A sum pooling approach is also employed to aggregate multi-scale mid-level image representation to highlight the importance of the descriptors beneficial for scene classification. Extensive experiments on SUN397 and MIT 67 indoor datasets demonstrate that the proposed method achieves promising classification performance. (C) 2017 SPIE and IS&T
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Visual saliency detection based on multi-scale and multi-channel mean
    Lang Sun
    Yan Tang
    Hong Zhang
    Multimedia Tools and Applications, 2016, 75 : 667 - 684
  • [22] Multi-scene image enhancement based on multi-channel illumination estimation
    Zhao, Runxing
    Wang, Zhiwen
    Guo, Wuyuan
    Zhang, Canlong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 226
  • [23] Data augmentation via joint multi-scale CNN and multi-channel attention for bumblebee image generation
    Du Rong
    Chen Shudong
    Li Weiwei
    Zhang Xueting
    Wang Xianhui
    Ge Jin
    The Journal of China Universities of Posts and Telecommunications, 2023, 30 (03) : 32 - 40
  • [24] Multi-classification method of arrhythmia based on multi-scale residual neural network and multi-channel data fusion
    Zhang, Fuchun
    Li, Meng
    Song, Li
    Wu, Liang
    Wang, Baiyang
    FRONTIERS IN PHYSIOLOGY, 2023, 14
  • [25] Mining Mid-level Features for Image Classification
    Basura Fernando
    Elisa Fromont
    Tinne Tuytelaars
    International Journal of Computer Vision, 2014, 108 : 186 - 203
  • [26] A mid-level scene change representation via audiovisual alignment
    Wang, Jinqiao
    Duan, Lingyu
    Lu, Hanqing
    Jin, Jesse S.
    Xu, Changsheng
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 1657 - 1660
  • [27] Mining Mid-level Features for Image Classification
    Fernando, Basura
    Fromont, Elisa
    Tuytelaars, Tinne
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 108 (03) : 186 - 203
  • [28] Remote Sensing Image Scene Classification via Multi-Level Representation Learning
    Fu, Wei
    Yang, Lishuang
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2942 - 2948
  • [29] Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition
    Yao, Cong
    Bai, Xiang
    Shi, Baoguang
    Liu, Wenyu
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 4042 - 4049
  • [30] CONTEXTUAL MULTI-SCALE IMAGE CLASSIFICATION ON QUADTREE
    Hedhli, Ihsen
    Moser, Gabriele
    Serpico, Sebastiano B.
    Zerubia, Josiane
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 1349 - 1353