Leveraging Self-Distillation and Disentanglement Network to Enhance Visual-Semantic Feature Consistency in Generalized Zero-Shot Learning

被引:0
|
作者
Liu, Xiaoming [1 ,2 ,3 ]
Wang, Chen [1 ,2 ]
Yang, Guan [1 ,2 ]
Wang, Chunhua [4 ]
Long, Yang [5 ]
Liu, Jie [3 ,6 ]
Zhang, Zhiyuan [1 ,2 ]
机构
[1] Zhongyuan Univ Technol, Sch Comp Sci, Zhengzhou 450007, Peoples R China
[2] Zhengzhou Key Lab Text Proc & Image Understanding, Zhengzhou 450007, Peoples R China
[3] Res Ctr Language Intelligence China, Beijing 100089, Peoples R China
[4] Huanghuai Univ, Sch Animat Acad, Zhumadian 463000, Peoples R China
[5] Univ Durham, Dept Comp Sci, Durham DH1 3LE, England
[6] North China Univ Technol, Sch Informat Sci, Beijing 100144, Peoples R China
基金
中国国家自然科学基金;
关键词
generalized zero-shot learning; self-distillation; disentanglement network; visual-semantic feature consistency;
D O I
10.3390/electronics13101977
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generalized zero-shot learning (GZSL) aims to simultaneously recognize both seen classes and unseen classes by training only on seen class samples and auxiliary semantic descriptions. Recent state-of-the-art methods infer unseen classes based on semantic information or synthesize unseen classes using generative models based on semantic information, all of which rely on the correct alignment of visual-semantic features. However, they often overlook the inconsistency between original visual features and semantic attributes. Additionally, due to the existence of cross-modal dataset biases, the visual features extracted and synthesized by the model may also mismatch with some semantic features, which could hinder the model from properly aligning visual-semantic features. To address this issue, this paper proposes a GZSL framework that enhances the consistency of visual-semantic features using a self-distillation and disentanglement network (SDDN). The aim is to utilize the self-distillation and disentanglement network to obtain semantically consistent refined visual features and non-redundant semantic features to enhance the consistency of visual-semantic features. Firstly, SDDN utilizes self-distillation technology to refine the extracted and synthesized visual features of the model. Subsequently, the visual-semantic features are then disentangled and aligned using a disentanglement network to enhance the consistency of the visual-semantic features. Finally, the consistent visual-semantic features are fused to jointly train a GZSL classifier. Extensive experiments demonstrate that the proposed method achieves more competitive results on four challenging benchmark datasets (AWA2, CUB, FLO, and SUN).
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Deep quantization network with visual-semantic alignment for zero-shot image retrieval
    Liu, Huixia
    Qin, Zhihong
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (07): : 4232 - 4247
  • [22] Dual Expert Distillation Network for Generalized Zero-Shot Learning
    Rao, Zhijie
    Guo, Jingcai
    Lu, Xiaocheng
    Liang, Jingming
    Zhang, Jie
    Wang, Haozhao
    Wei, Kang
    Cao, Xiaofeng
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4833 - 4841
  • [23] Improved Visual-Semantic Alignment for Zero-Shot Object Detection
    Rahman, Shafin
    Khan, Salman
    Barnes, Nick
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11932 - 11939
  • [24] Contrastive visual feature filtering for generalized zero-shot learning
    Meng, Shixuan
    Jiang, Rongxin
    Tian, Xiang
    Zhou, Fan
    Chen, Yaowu
    Liu, Junjie
    Shen, Chen
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [25] Graph-Based Visual-Semantic Entanglement Network for Zero-Shot Image Recognition
    Hu, Yang
    Wen, Guihua
    Chapman, Adriane
    Yang, Pei
    Luo, Mingnan
    Xu, Yingxue
    Dai, Dan
    Hall, Wendy
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2473 - 2487
  • [26] Semantic-Visual Consistency Constraint Network for Zero-Shot Image Semantic Segmentation
    Chen, Qiong
    Feng, Yuan
    Li, Zhiqun
    Yang, Yong
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2024, 52 (10): : 41 - 50
  • [27] SEMANTIC MANIFOLD ALIGNMENT IN VISUAL FEATURE SPACE FOR ZERO-SHOT LEARNING
    Liao, Changsu
    Su, Li
    Zhang, Wegang
    Huang, Qingming
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2018,
  • [28] Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
    Cheng, Ruizhe
    Wu, Bichen
    Zhang, Peizhao
    Vajda, Peter
    Gonzalez, Joseph E.
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3113 - 3118
  • [29] Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
    Ueki, Kazuya
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 628 - 634
  • [30] Hierarchical Semantic Loss and Confidence Estimator for Visual-Semantic Embedding-Based Zero-Shot Learning
    Seo, Sanghyun
    Kim, Juntae
    APPLIED SCIENCES-BASEL, 2019, 9 (15):