SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting

被引：0

作者：

Zgaren, Ahmed ^{[1
,2
]}

Bouachir, Wassim ^{[2
]}

Bouguila, Nizar ^{[1
]}

机构：

[1] Concordia Univ, Concordia Inst Informat & Syst Engn CIISE, Montreal, PQ H3G 1M8, Canada

[2] Univ Quebec TELUQ, Data Sci Lab, Montreal, PQ H2S 3L5, Canada

来源：

JOURNAL OF IMAGING | 2025年 / 11卷 / 02期

基金：

加拿大自然科学与工程研究理事会;

关键词：

object counting; transformers; visual attention; zero-shot; class-agnostic;

D O I：

10.3390/jimaging11020052

中图分类号：

TB8 [摄影技术];

学科分类号：

0804 ;

摘要：

Zero-shot counting is a subcategory of Generic Visual Object Counting, which aims to count objects from an arbitrary class in a given image. While few-shot counting relies on delivering exemplars to the model to count similar class objects, zero-shot counting automates the operation for faster processing. This paper proposes a fully automated zero-shot method outperforming both zero-shot and few-shot methods. By exploiting feature maps from a pre-trained detection-based backbone, we introduce a new Visual Embedding Module designed to generate semantic embeddings within object contextual information. These embeddings are then fed to a Self-Attention Matching Module to generate an encoded representation for the head counter. Our proposed method has outperformed recent zero-shot approaches, achieving the best Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) results of 8.89 and 35.83, respectively, on the FSC147 dataset. Additionally, our method demonstrates competitive performance compared to few-shot methods, advancing the capabilities of visual object counting in various industrial applications such as tree counting, wildlife animal counting, and medical applications like blood cell counting.

引用

页数：21

共 50 条

[41] Disentangled Ontology Embedding for Zero-shot Learning
Geng, Yuxia
Chen, Jiaoyan
Zhang, Wen
Xu, Yajing
Chen, Zhuo
Pan, Jeff Z.
Huang, Yufeng
Xiong, Feiyu
Chen, Huajun
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 443 - 453
[42] Zero-Shot Video Object Segmentation With Co-Attention Siamese Networks
Lu, Xiankai
Wang, Wenguan
Shen, Jianbing
Crandall, David
Luo, Jiebo
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (04) : 2228 - 2242
[43] Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Pei, Gensheng
Yao, Yazhou
Shen, Fumin
Huang, Dan
Huang, Xingguo
Shen, Heng-Tao
arXiv, 2023,
[44] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
Jiang, Ruixiang
Liu, Lingbo
Chen, Changwen
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4535 - 4545
[45] Union Embedding and Backbone-Attention boost Zero-Shot Learning Model (UBZSL)
Li, Ziyu
2022 IEEE 5TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING APPLICATIONS AND SYSTEMS, IPAS, 2022,
[46] Towards Zero-Shot Learning: A Brief Review and an Attention-Based Embedding Network
Xie, Guo-Sen
Zhang, Zheng
Xiong, Huan
Shao, Ling
Li, Xuelong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1181 - 1197
[47] FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Zhuang, Jiedong
Hu, Jiaqi
Mu, Lianrui
Hu, Rui
Liang, Xiaoyu
Ye, Jiangnan
Hu, Haoji
COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 236 - 253
[48] Multi-scale visual attention for attribute disambiguation in zero-shot learning
Tian, Long
Chen, Bo
Ren, Jie
Zhang, Hao
Wu, Zhenhua
Han, Ning
Chen, Yuanwei
Liu, Hongwei
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 103
[49] DVAMN: Dual Visual Attention Matching Network for Zero-Shot Action Recognition
Qi, Cheng
Feng, Zhiyong
Xing, Meng
Su, Yong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 564 - 575
[50] Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks
Chen, Long
Zhang, Hanwang
Xiao, Jun
Liu, Wei
Chang, Shih-Fu
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1043 - 1052

← 1 2 3 4 5 →