Kernelized Bilinear CNN Models for Fine-Grained Visual Recognition

被引:0
|
作者
Ge S.-Y. [1 ]
Gao Z.-L. [1 ]
Zhang B.-B. [1 ]
Li P.-H. [1 ]
机构
[1] School of Information and Communication Engineering, Dalian University of Technology, Dalian, 116024, Liaoning
来源
关键词
Bilinear convolution neural network; End to end learning; Fine-grained visual recognition; Kernelized bilinear pooling;
D O I
10.3969/j.issn.0372-2112.2019.10.015
中图分类号
学科分类号
摘要
The bilinear convolutional neural network(B-CNN) has been widely used in computer vision. B-CNN can capture the linear correlation between different channels by performing the outer product operation on the features of the convolutional layer output, thus enhancing the representative ability of the convolutional network. Since the non-linear relationship between the channels in the feature map is not taken account of, this method cannot make full use of the richer information contained between the channels. In order to solve this problem, this paper proposes a kernelized bilinear convolutional neural network employing the kernel function to effectively capture the non-linear relationship between the channels in the feature map, and further enhancing the representative ability of the convolutional network. In this paper, the method is evaluated on three common fine-grained benchmarks CUB-200-2011, FGVC-Aircraft and Cars. Experiments show that our method is superior to its counterparts on all three benchmarks. © 2019, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:2134 / 2141
页数:7
相关论文
共 30 条
  • [1] Krizhevsky A., Sutskever I., Hinton G.E., Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, pp. 1097-1105, (2012)
  • [2] Deng J., Dong W., Socher R., Et al., Imagenet: A large-scale hierarchical image database, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 248-255, (2009)
  • [3] Ke S.-C., Zhao Y.-W., Li B.-C., Et al., Image retrieval based on convolutional neural network and kernel-based supervised hashing, Acta Electronica Sinica, 45, 1, pp. 157-163, (2017)
  • [4] Wang Z.-Y., Wu Y.-X., Zhang G.-Y., Et al., RGB-D scene parsing based on spatial structured inference deep fusion networks, Acta Electronica Sinica, 46, 5, pp. 1253-1258, (2018)
  • [5] Li K., Li Y.-M., Hu X.-M., Et al., Robust and accurate object tracking algorithm based on convolutional neural network, Acta Electronica Sinica, 46, 9, pp. 2087-2093, (2018)
  • [6] Zou C.-M., Luo Y., Xu X.-L., Fine-grained image classification method based on multi-feature combination, Journal of Computer Applications, 38, 7, (2018)
  • [7] Lin T.Y., Roychowdhury A., Maji S., Bilinear CNN models for fine-grained visual recognition, Proceedings of IEEE International Conference on Computer Vision, pp. 1449-1457, (2015)
  • [8] Li P., Xie J., Wang Q., Et al., Is second-order information helpful for large-scale visual recognition, Proceedings of IEEE International Conference on Computer Vision, pp. 2070-2078, (2017)
  • [9] Lin T.Y., Maji S., Improved bilinear pooling with CNNs, British Machine Vision Conference, pp. 1-12, (2017)
  • [10] Ioffe S., Szegedy C., Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning, pp. 448-456, (2015)