Leveraging Self-Distillation and Disentanglement Network to Enhance Visual-Semantic Feature Consistency in Generalized Zero-Shot Learning

被引:0
|
作者
Liu, Xiaoming [1 ,2 ,3 ]
Wang, Chen [1 ,2 ]
Yang, Guan [1 ,2 ]
Wang, Chunhua [4 ]
Long, Yang [5 ]
Liu, Jie [3 ,6 ]
Zhang, Zhiyuan [1 ,2 ]
机构
[1] Zhongyuan Univ Technol, Sch Comp Sci, Zhengzhou 450007, Peoples R China
[2] Zhengzhou Key Lab Text Proc & Image Understanding, Zhengzhou 450007, Peoples R China
[3] Res Ctr Language Intelligence China, Beijing 100089, Peoples R China
[4] Huanghuai Univ, Sch Animat Acad, Zhumadian 463000, Peoples R China
[5] Univ Durham, Dept Comp Sci, Durham DH1 3LE, England
[6] North China Univ Technol, Sch Informat Sci, Beijing 100144, Peoples R China
基金
中国国家自然科学基金;
关键词
generalized zero-shot learning; self-distillation; disentanglement network; visual-semantic feature consistency;
D O I
10.3390/electronics13101977
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generalized zero-shot learning (GZSL) aims to simultaneously recognize both seen classes and unseen classes by training only on seen class samples and auxiliary semantic descriptions. Recent state-of-the-art methods infer unseen classes based on semantic information or synthesize unseen classes using generative models based on semantic information, all of which rely on the correct alignment of visual-semantic features. However, they often overlook the inconsistency between original visual features and semantic attributes. Additionally, due to the existence of cross-modal dataset biases, the visual features extracted and synthesized by the model may also mismatch with some semantic features, which could hinder the model from properly aligning visual-semantic features. To address this issue, this paper proposes a GZSL framework that enhances the consistency of visual-semantic features using a self-distillation and disentanglement network (SDDN). The aim is to utilize the self-distillation and disentanglement network to obtain semantically consistent refined visual features and non-redundant semantic features to enhance the consistency of visual-semantic features. Firstly, SDDN utilizes self-distillation technology to refine the extracted and synthesized visual features of the model. Subsequently, the visual-semantic features are then disentangled and aligned using a disentanglement network to enhance the consistency of the visual-semantic features. Finally, the consistent visual-semantic features are fused to jointly train a GZSL classifier. Extensive experiments demonstrate that the proposed method achieves more competitive results on four challenging benchmark datasets (AWA2, CUB, FLO, and SUN).
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Visual-semantic consistency matching network for generalized zero-shot learning
    Zhang, Zhenqi
    Cao, Wenming
    NEUROCOMPUTING, 2023, 536 : 30 - 39
  • [2] Visual-Semantic Aligned Bidirectional Network for Zero-Shot Learning
    Gao, Rui
    Hou, Xingsong
    Qin, Jie
    Shen, Yuming
    Long, Yang
    Liu, Li
    Zhang, Zhao
    Shao, Ling
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1649 - 1664
  • [3] Transductive Visual-Semantic Embedding for Zero-shot Learning
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shao, Jie
    Huang, Zi
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 41 - 49
  • [4] Indirect visual-semantic alignment for generalized zero-shot recognition
    Chen, Yan-He
    Yeh, Mei-Chen
    MULTIMEDIA SYSTEMS, 2024, 30 (02)
  • [5] Zero-shot learning via visual-semantic aligned autoencoder
    Wei, Tianshu
    Huang, Jinjie
    Jin, Cong
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14081 - 14095
  • [6] VS-Boost: Boosting Visual-Semantic Association for Generalized Zero-Shot Learning
    Li, Xiaofan
    Zhang, Yachao
    Bian, Shiran
    Qu, Yanyun
    Xie, Yuan
    Shi, Zhongchao
    Fan, Jianping
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1107 - 1115
  • [7] Visual-Semantic Graph Matching Net for Zero-Shot Learning
    Duan, Bowen
    Chen, Shiming
    Guo, Yufei
    Xie, Guo-Sen
    Ding, Weiping
    Wang, Yisong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [8] Two-Level Adversarial Visual-Semantic Coupling for Generalized Zero-shot Learning
    Chandhok, Shivam
    Balasubramanian, Vineeth N.
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3099 - 3107
  • [9] Zero-shot image classification via Visual-Semantic Feature Decoupling
    Sun, Xin
    Tian, Yu
    Li, Haojie
    MULTIMEDIA SYSTEMS, 2024, 30 (02)
  • [10] Scalable Zero-Shot Learning via Binary Visual-Semantic Embeddings
    Shen, Fumin
    Zhou, Xiang
    Yu, Jun
    Yang, Yang
    Liu, Li
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (07) : 3662 - 3674