Zero-shot Video Classification with Appropriate Web and Task Knowledge Transfer

被引:7
|
作者
Zhuo, Junbao [1 ]
Zhu, Yan [2 ]
Cui, Shuhao [3 ]
Wang, Shuhui [1 ,4 ]
Ma, Bin [3 ]
Huang, Qingming [1 ,2 ]
Wei, Xiaoming [3 ]
Wei, Xiaolin [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Meituan Inc, Beijing, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Zero-shot Video Classification; Transfer Learning;
D O I
10.1145/3503161.3548008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Zero-shot video classification (ZSVC) that aims to recognize video classes that have never been seen during model training, has become a thriving research direction. ZSVC is achieved by building mappings between visual and semantic embeddings. Recently, ZSVC has been achieved by automatically mining the underlying objects in videos as attributes and incorporating external commonsense knowledge. However, the object mined from seen categories can not generalized to unseen ones. Besides, the category-object relationships are usually extracted from commonsense knowledge or word embedding, which is not consistent with video modality. To tackle these issues, we propose to mine associated objects and category-object relationships for each category from retrieved web images. The associated objects of all categories are employed as generic attributes and the mined category-object relationships could narrow the modality inconsistency for better knowledge transfer. Another issue of existing ZSVC methods is that the model sufficiently trained with labeled seen categories may not generalize well to distinct unseen categories. To encourage a more reliable transfer, we propose Task Similarity aware Representation Learning (TSRL). In TSRL, the similarity between seen categories and the unseen ones is estimated and used to regularize the model in an appropriate way. We construct a model for ZSVC based on the constructed attributes, the mined category-object relationships and the proposed TSRL. Experimental results on four public datasets, i.e., FCVID, UCF101, HMDB51 and Olympic Sports, show that our model performs favorably against state-of-the-art methods. Our codes are publicly available at https://github.com/junbaoZHUO/TSRL.
引用
收藏
页码:5761 / 5772
页数:12
相关论文
共 50 条
  • [21] Zero-Shot Transfer Learning Framework for Plant Leaf Disease Classification
    Satya Rajendra Singh, R.
    Sanodiya, Rakesh Kumar
    IEEE ACCESS, 2023, 11 : 143861 - 143880
  • [22] Canonical mean filter for almost zero-shot multi-task classification
    Yong Li
    Heng Wang
    Xiang Ye
    Applied Intelligence, 2023, 53 : 24422 - 24434
  • [23] Canonical mean filter for almost zero-shot multi-task classification
    Li, Yong
    Wang, Heng
    Ye, Xiang
    APPLIED INTELLIGENCE, 2023, 53 (20) : 24422 - 24434
  • [24] Fine-Grained Feature Generation for Generalized Zero-Shot Video Classification
    Hong, Mingyao
    Zhang, Xinfeng
    Li, Guorong
    Huang, Qingming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1599 - 1612
  • [25] Transferring Knowledge From Text to Video: Zero-Shot Anticipation for Procedural Actions
    Sener, Fadime
    Saraf, Rishabh
    Yao, Angela
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7836 - 7852
  • [26] Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification
    Pu, Shi
    Zhao, Kaili
    Zheng, Mao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19936 - 19945
  • [27] Zero-shot Natural Language Video Localization
    Nam, Jinwoo
    Ahn, Daechul
    Kang, Dongyeop
    Ha, Seong Jong
    Choi, Jonghyun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1450 - 1459
  • [28] Zero-Shot Dialogue State Tracking via Cross-Task Transfer
    Lin, Zhaojiang
    Liu, Bing
    Madotto, Andrea
    Moon, Seungwhan
    Crook, Paul
    Zhou, Zhenpeng
    Wang, Zhiguang
    Yu, Zhou
    Cho, Eunjoon
    Subba, Rajen
    Fung, Pascale
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 7890 - 7900
  • [29] Zero-shot Transfer Learning within a Heterogeneous Graph via Knowledge Transfer Networks
    Yoon, Minji
    Palowitch, John
    Zelle, Dustin
    Hu, Ziniu
    Salakhutdinov, Ruslan
    Perozzi, Bryan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] Class knowledge overlay to visual feature learning for zero-shot image classification
    Xie, Cheng
    Zeng, Ting
    Xiang, Hongxin
    Li, Keqin
    Yang, Yun
    Liu, Qing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 207