Scalable Zero-Shot Learning via Binary Visual-Semantic Embeddings

被引:28
|
作者
Shen, Fumin [1 ,2 ]
Zhou, Xiang [1 ,2 ]
Yu, Jun [3 ]
Yang, Yang [1 ,2 ]
Liu, Li [4 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 610054, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Sichuan, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China
[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
基金
中国国家自然科学基金;
关键词
Zero-shot learning; binary embeddings;
D O I
10.1109/TIP.2019.2899987
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot learning aims to classify the visual instances from unseen classes in the absence of training examples. This is typically achieved by directly mapping visual features to a semantic embedding space of classes (e.g., attributes or word vectors), where the similarity between the two modalities can be readily measured. However, the semantic space may not be reliable for recognition due to the noisy class embeddings or visual bias problem. In this paper, we propose a novel binary embedding-based zero-shot learning (BZSL) method, which recognizes the visual instances from unseen classes through an intermediate discriminative Hamming space. Specifically, BZSL jointly learns two binary coding functions to encode both visual instances and class embeddings into the Hamming space, which well alleviates the visual-semantic bias problem. As a desiring property, classifying an unseen instance thereby can he efficiently done by retrieving its nearest class codes with minimal Hamming distance. During training, by introducing two auxiliary variables for the coding functions, we formulate an equivalent correlation maximization problem, which admits an analytical solution. The resulting algorithm thus enjoys both highly efficient training and scalable novel class inferring. Extensive experiments on four benchmark datasets, including the full ImageNet Fall 2011 dataset with over 20k unseen classes, demonstrate the superiority of our method on the zero-shot learning task. Particularly, we show that increasing the binary embedding dimension can inevitably improve the recognition accuracy.
引用
收藏
页码:3662 / 3674
页数:13
相关论文
共 50 条
  • [21] Semantically Grounded Visual Embeddings for Zero-Shot Learning
    Nawaz, Shah
    Cavazza, Jacopo
    Del Bue, Alessio
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4588 - 4598
  • [22] Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs
    Wang, Xiaolong
    Ye, Yufei
    Gupta, Abhinav
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6857 - 6866
  • [23] ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
    Tewel, Yoad
    Shalev, Yoav
    Schwartz, Idan
    Wolf, Lior
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17897 - 17907
  • [24] Deep quantization network with visual-semantic alignment for zero-shot image retrieval
    Liu, Huixia
    Qin, Zhihong
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (07): : 4232 - 4247
  • [25] Learning Robust Visual-Semantic Embeddings
    Tsai, Yao-Hung Hubert
    Huang, Liang-Kang
    Salakhutdinov, Ruslan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3591 - 3600
  • [26] Joint Visual and Semantic Optimization for zero-shot learning
    Wu, Hanrui
    Yan, Yuguang
    Chen, Sentao
    Huang, Xiangkang
    Wu, Qingyao
    Ng, Michael K.
    KNOWLEDGE-BASED SYSTEMS, 2021, 215 (215)
  • [27] Zero-Shot Learning via Visual Abstraction
    Antol, Stanislaw
    Zitnick, C. Lawrence
    Parikh, Devi
    COMPUTER VISION - ECCV 2014, PT IV, 2014, 8692 : 401 - 416
  • [28] Learning adversarial semantic embeddings for zero-shot recognition in open worlds
    Li, Tianqi
    Pang, Guansong
    Bai, Xiao
    Zheng, Jin
    Zhou, Lei
    Ning, Xin
    PATTERN RECOGNITION, 2024, 149
  • [29] VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning
    Xu, Wenjia
    Xian, Yongqin
    Wang, Jiuniu
    Schiele, Bernt
    Akata, Zeynep
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9306 - 9315
  • [30] Zero-shot learning with self-supervision by shuffling semantic embeddings
    Kim, Hoseong
    Lee, Jewook
    Byun, Hyeran
    NEUROCOMPUTING, 2021, 437 : 1 - 8