Scalable Zero-Shot Learning via Binary Visual-Semantic Embeddings

被引:28
|
作者
Shen, Fumin [1 ,2 ]
Zhou, Xiang [1 ,2 ]
Yu, Jun [3 ]
Yang, Yang [1 ,2 ]
Liu, Li [4 ]
Shen, Heng Tao [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu 610054, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Sichuan, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Zhejiang, Peoples R China
[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
基金
中国国家自然科学基金;
关键词
Zero-shot learning; binary embeddings;
D O I
10.1109/TIP.2019.2899987
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot learning aims to classify the visual instances from unseen classes in the absence of training examples. This is typically achieved by directly mapping visual features to a semantic embedding space of classes (e.g., attributes or word vectors), where the similarity between the two modalities can be readily measured. However, the semantic space may not be reliable for recognition due to the noisy class embeddings or visual bias problem. In this paper, we propose a novel binary embedding-based zero-shot learning (BZSL) method, which recognizes the visual instances from unseen classes through an intermediate discriminative Hamming space. Specifically, BZSL jointly learns two binary coding functions to encode both visual instances and class embeddings into the Hamming space, which well alleviates the visual-semantic bias problem. As a desiring property, classifying an unseen instance thereby can he efficiently done by retrieving its nearest class codes with minimal Hamming distance. During training, by introducing two auxiliary variables for the coding functions, we formulate an equivalent correlation maximization problem, which admits an analytical solution. The resulting algorithm thus enjoys both highly efficient training and scalable novel class inferring. Extensive experiments on four benchmark datasets, including the full ImageNet Fall 2011 dataset with over 20k unseen classes, demonstrate the superiority of our method on the zero-shot learning task. Particularly, we show that increasing the binary embedding dimension can inevitably improve the recognition accuracy.
引用
收藏
页码:3662 / 3674
页数:13
相关论文
共 50 条
  • [1] Zero-shot learning via visual-semantic aligned autoencoder
    Wei, Tianshu
    Huang, Jinjie
    Jin, Cong
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14081 - 14095
  • [2] Transductive Visual-Semantic Embedding for Zero-shot Learning
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shao, Jie
    Huang, Zi
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 41 - 49
  • [3] Visual-Semantic Aligned Bidirectional Network for Zero-Shot Learning
    Gao, Rui
    Hou, Xingsong
    Qin, Jie
    Shen, Yuming
    Long, Yang
    Liu, Li
    Zhang, Zhao
    Shao, Ling
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1649 - 1664
  • [4] Visual-Semantic Graph Matching Net for Zero-Shot Learning
    Duan, Bowen
    Chen, Shiming
    Guo, Yufei
    Xie, Guo-Sen
    Ding, Weiping
    Wang, Yisong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [5] Zero-shot image classification via Visual-Semantic Feature Decoupling
    Sun, Xin
    Tian, Yu
    Li, Haojie
    MULTIMEDIA SYSTEMS, 2024, 30 (02)
  • [6] Visual-semantic consistency matching network for generalized zero-shot learning
    Zhang, Zhenqi
    Cao, Wenming
    NEUROCOMPUTING, 2023, 536 : 30 - 39
  • [7] Zero-shot learning with visual-semantic mutual reinforcement for image recognition
    Zhang, Yuhong
    Chen, Taohong
    Yu, Kui
    Hua, Xuegang
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (05)
  • [8] Zero-Shot Learning via Category-Specific Visual-Semantic Mapping and Label Refinement
    Niu, Li
    Cai, Jianfei
    Veeraraghavan, Ashok
    Zhang, Liqing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 965 - 979
  • [9] Improved Visual-Semantic Alignment for Zero-Shot Object Detection
    Rahman, Shafin
    Khan, Salman
    Barnes, Nick
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11932 - 11939
  • [10] Indirect visual-semantic alignment for generalized zero-shot recognition
    Chen, Yan-He
    Yeh, Mei-Chen
    MULTIMEDIA SYSTEMS, 2024, 30 (02)