Learning visual similarity for image retrieval with global descriptors and capsule networks

被引:0
|
作者
Durmus, Duygu [1 ]
Gudukbay, Ugur [1 ]
Ulusoy, Ozgur [1 ]
机构
[1] Bilkent Univ, Dept Comp Engn, TR-06800 Ankara, Turkiye
关键词
Deep learning; Neural networks; Capsule networks; Global descriptors; Image retrieval; Triplet loss; Cost-sensitive regularized cross-entropy loss;
D O I
10.1007/s11042-023-16164-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding matching images across large and unstructured datasets is vital in many computer vision applications. With the emergence of deep learning-based solutions, various visual tasks, such as image retrieval, have been successfully addressed. Learning visual similarity is crucial for image matching and retrieval tasks. Capsule Networks enable learning richer information that describes the object without losing the essential spatial relationship between the object and its parts. Besides, global descriptors are widely used for representing images. We propose a framework that combines the power of global descriptors and Capsule Networks by benefiting from the information of multiple views of images to enhance the image retrieval performance. The Spatial Grouping Enhance strategy, which enhances sub-features parallelly, and self-attention layers, which explore global dependencies within internal representations of images, are utilized to empower the image representations. The approach captures resemblances between similar images and differences between non-similar images using triplet loss and cost-sensitive regularized cross-entropy loss. The results are superior to the state-of-the-art approaches for the Stanford Online Products Database with Recall@K of 85.0, 94.4, 97.8, and 99.3, where K is 1, 10, 100, and 1000, respectively.
引用
收藏
页码:20243 / 20263
页数:21
相关论文
共 50 条
  • [31] Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval
    Venkataramanan, Aishwarya
    Laviale, Martin
    Pradalier, Cedric
    COMPUTER VISION SYSTEMS, ICVS 2023, 2023, 14253 : 422 - 431
  • [32] Performance analysis of various local and global shape descriptors for image retrieval
    Singh, Chandan
    Sharma, Pooja
    MULTIMEDIA SYSTEMS, 2013, 19 (04) : 339 - 357
  • [33] Methods for the construction of image descriptors for the global visual localization problem
    Nedoshivina, L. S.
    Peterson, M. V.
    JOURNAL OF OPTICAL TECHNOLOGY, 2017, 84 (06) : 377 - 383
  • [34] Learning Context-Content Similarity for Image Retrieval
    Singh, Shirish
    Xie, Meng
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2017 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC '17 ADJUNCT), 2017, : 201 - 204
  • [35] Semisupervised Online Multikernel Similarity Learning for Image Retrieval
    Liang, Jianqing
    Hu, Qinghua
    Wang, Wenwu
    Han, Yahong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (05) : 1077 - 1089
  • [36] Structure Similarity Preservation Learning for Asymmetric Image Retrieval
    Wu, Hui
    Wang, Min
    Zhou, Wengang
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4693 - 4705
  • [37] Comparison of 3D local and global descriptors for similarity retrieval of range data
    Bayramoglu, Neslihan
    Alatan, A. Aydin
    NEUROCOMPUTING, 2016, 184 : 13 - 27
  • [38] Image-to-Image Retrieval by Learning Similarity between Scene Graphs
    Yoon, Sangwoong
    Kang, Woo Young
    Jeon, Sungwook
    Lee, SeongEun
    Han, Changjin
    Park, Jonghun
    Kim, Eun-Sol
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10718 - 10726
  • [39] Interactive Relevance Visual Learning for Image Retrieval
    Fu, Hsin-Chia
    Wang, Z. H.
    Wang, W. J.
    Pao, Hsiao-Tien
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT I (IWANN 2015), 2015, 9094 : 227 - 240
  • [40] Capsule Networks, but Not Convolutional Networks, Explain Global Configurational Visual Effects
    Doerig, Adrien
    Bornet, Alban
    Herzog, Michael H.
    PERCEPTION, 2019, 48 : 48 - 48