A coarse-to-fine capsule network for fine-grained image categorization

被引:7
|
作者
Lin, Zhongqi [1 ,2 ]
Jia, Jingdun [2 ]
Huang, Feng [3 ]
Gao, Wanlin [1 ,2 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
[2] Minist Agr & Rural Affairs, Key Lab Agr Informatizat Standardizat, Beijing 100083, Peoples R China
[3] China Agr Univ, Coll Sci, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
Capsule network (CapsNet); Fine-grained image classification; Coarse-to-fine attention; Increasingly specialized perception; MODEL;
D O I
10.1016/j.neucom.2021.05.032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained image categorization is challenging due to the subordinate categories within an entry-level category can only be distinguished by subtle discriminations. This necessitates localizing key (most dis-criminative) regions and extract domain-specific features alternately. Existing methods predominantly realize fine-grained categorization independently, while ignoring that representation learning and fore-ground localization can reinforce each other iteratively. Sharing the state-of-the-art performance of cap-sule encoding for abstract semantic representation, we formalize our pipeline as a coarse-to-fine capsule network (CTF-CapsNet). It consists of customized expert CapsNets arranged in each perception scale and region proposal networks (RPNs) between two adjacent scales. Their mutually motivated self-optimization can achieve increasingly specialized cross-utilization of object-level and component-level descriptions. The RPN zooms the areas to turn the attention to the most distinctive regions by concerning preceding informations learned by expert CapsNet for references, whilst a finer-scale model takes as feed an amplified attended patch from last scale. Overall, CTF-CapsNet is driven by three focal margin losses between label prediction and ground truth, and three regeneration losses between original input images/ feature maps and reconstructed images. Experiments demonstrate that without any prior knowledge or strongly-supervised supports (e.g., bounding-box/part annotations), CTF-CapsNet can deliver competitive categorization performance among state-of-the-arts, i.e., testing accuracy achieves 89.57%, 88.63%, 90.51%, and 91.53% on our hand-crafted rice growth image set and three public benchmarks, i.e., CUB Birds, Stanford Dogs, and Stanford Cars, respectively. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:200 / 219
页数:20
相关论文
共 50 条
  • [31] Coarse-to-fine multiscale fusion network for single image deraining
    Zhang, Jiahao
    Zhang, Juan
    Wu, Xing
    Shi, Zhicai
    Hwang, Jenq-Neng
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [32] An edge guided coarse-to-fine generative network for image outpainting
    Xu, Yiwen
    Pagnucco, Maurice
    Song, Yang
    NEUROCOMPUTING, 2023, 541
  • [33] Hyperlayer Bilinear Pooling with application to fine-grained categorization and image retrieval
    Sun, Qiule
    Wang, Qilong
    Zhang, Jianxin
    Li, Peihua
    NEUROCOMPUTING, 2018, 282 : 174 - 183
  • [34] Fine-Grained Visual Categorization by Localizing Object Parts With Single Image
    Zheng, Xiangtao
    Qi, Lei
    Ren, Yutao
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1187 - 1199
  • [35] Posture-guided part learning for fine-grained image categorization
    Song, Wei
    Chen, Dongmei
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33013
  • [36] A Survey on Fine-grained Image Categorization Using Deep Convolutional Features
    Luo J.-H.
    Wu J.-X.
    Wu, Jian-Xin (wujx2001@nju.edu.cn), 1600, Science Press (43): : 1306 - 1318
  • [37] Detecting Densely Distributed Graph Patterns for Fine-Grained Image Categorization
    Zhang, Luming
    Yang, Yang
    Wang, Meng
    Hong, Richang
    Nie, Liqiang
    Li, Xuelong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (02) : 553 - 565
  • [38] DATA-DRIVEN TAXONOMY FOREST FOR FINE-GRAINED IMAGE CATEGORIZATION
    Wu, Xiaomeng
    Mori, Minoru
    Kashino, Kunio
    2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,
  • [39] A Coarse-to-Fine Network for Craniopharyngioma Segmentation
    Yu, Yijie
    Zhang, Lei
    Shu, Xin
    Wang, Zizhou
    Chen, Chaoyue
    Xu, Jianguo
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2022, 2022, 13583 : 91 - 100
  • [40] Fine-Grained Image Search
    Xie, Lingxi
    Wang, Jingdong
    Zhang, Bo
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (05) : 636 - 647