Learning Modality-Invariant Latent Representations for Generalized Zero-shot Learning

被引:25
|
作者
Li, Jingjing [1 ]
Jing, Mengmeng [1 ]
Zhu, Lei [2 ]
Ding, Zhengming [3 ]
Lu, Ke [1 ]
Yang, Yang [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu, Peoples R China
[2] Shandong Normal Univ, Jinan, Shandong, Peoples R China
[3] Indiana Univ Purdue Univ, Indianapolis, IN 46202 USA
基金
中国国家自然科学基金;
关键词
Zero-shot learning; mutual information estimation; generalized ZSL; variational autoencoders;
D O I
10.1145/3394171.3413503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, feature generating methods have been successfully applied to zero-shot learning (ZSL). However, most previous approaches only generate visual representations for zero-shot recognition. In fact, typical ZSL is a classic multi-modal learning protocol which consists of a visual space and a semantic space. In this paper, therefore, we present a new method which can simultaneously generate both visual representations and semantic representations so that the essential multi-modal information associated with unseen classes can be captured. Specifically, we address the most challenging issue in such a paradigm, i.e., how to handle the domain shift and thus guarantee that the learned representations are modality-invariant. To this end, we propose two strategies: 1) leveraging the mutual information between the latent visual representations and the semantic representations; 2) maximizing the entropy of the joint distribution of the two latent representations. By leveraging the two strategies, we argue that the two modalities can be well aligned. At last, extensive experiments on five widely used datasets verify that the proposed method is able to significantly outperform previous the state-of-the-arts.
引用
收藏
页码:1348 / 1356
页数:9
相关论文
共 50 条
  • [1] Learning MLatent Representations for Generalized Zero-Shot Learning
    Ye, Yalan
    Pan, Tongjie
    Luo, Tonghoujun
    Li, Jingjing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2252 - 2265
  • [2] Learning Invariant Visual Representations for Compositional Zero-Shot Learning
    Zhang, Tian
    Liang, Kongming
    Du, Ruoyi
    Sun, Xian
    Ma, Zhanyu
    Guo, Jun
    COMPUTER VISION, ECCV 2022, PT XXIV, 2022, 13684 : 339 - 355
  • [3] LEARNING MODALITY-INVARIANT REPRESENTATIONS FOR SPEECH AND IMAGES
    Leidal, Kenneth
    Harwath, David
    Glass, James
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 424 - 429
  • [4] Learning domain invariant unseen features for generalized zero-shot classification
    Li, Xiao
    Fang, Min
    Li, Haikun
    Wu, Jinqiao
    KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [5] Contrastive semantic disentanglement in latent space for generalized zero-shot learning
    Fan, Wentao
    Liang, Chen
    Wang, Tian
    KNOWLEDGE-BASED SYSTEMS, 2022, 257
  • [6] Contrastive semantic disentanglement in latent space for generalized zero-shot learning
    Fan, Wentao
    Liang, Chen
    Wang, Tian
    Knowledge-Based Systems, 2022, 257
  • [7] Enhancing Domain-Invariant Parts for Generalized Zero-Shot Learning
    Zhang, Yang
    Feng, Songhe
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6283 - 6291
  • [8] Generalized zero-shot learning via discriminative and transferable disentangled representations
    Zhang, Chunyu
    Li, Zhanshan
    NEURAL NETWORKS, 2025, 183
  • [9] Meta-Learning for Generalized Zero-Shot Learning
    Verma, Vinay Kumar
    Brahma, Dhanajit
    Rai, Piyush
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6062 - 6069
  • [10] Learning the Compositional Domains for Generalized Zero-shot Learning
    Dong, Hanze
    Fu, Yanwei
    Hwang, Sung Ju
    Sigal, Leonid
    Xue, Xiangyang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 221