Zero-shot Video Classification with Appropriate Web and Task Knowledge Transfer

被引:7
|
作者
Zhuo, Junbao [1 ]
Zhu, Yan [2 ]
Cui, Shuhao [3 ]
Wang, Shuhui [1 ,4 ]
Ma, Bin [3 ]
Huang, Qingming [1 ,2 ]
Wei, Xiaoming [3 ]
Wei, Xiaolin [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Meituan Inc, Beijing, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Zero-shot Video Classification; Transfer Learning;
D O I
10.1145/3503161.3548008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Zero-shot video classification (ZSVC) that aims to recognize video classes that have never been seen during model training, has become a thriving research direction. ZSVC is achieved by building mappings between visual and semantic embeddings. Recently, ZSVC has been achieved by automatically mining the underlying objects in videos as attributes and incorporating external commonsense knowledge. However, the object mined from seen categories can not generalized to unseen ones. Besides, the category-object relationships are usually extracted from commonsense knowledge or word embedding, which is not consistent with video modality. To tackle these issues, we propose to mine associated objects and category-object relationships for each category from retrieved web images. The associated objects of all categories are employed as generic attributes and the mined category-object relationships could narrow the modality inconsistency for better knowledge transfer. Another issue of existing ZSVC methods is that the model sufficiently trained with labeled seen categories may not generalize well to distinct unseen categories. To encourage a more reliable transfer, we propose Task Similarity aware Representation Learning (TSRL). In TSRL, the similarity between seen categories and the unseen ones is estimated and used to regularize the model in an appropriate way. We construct a model for ZSVC based on the constructed attributes, the mined category-object relationships and the proposed TSRL. Experimental results on four public datasets, i.e., FCVID, UCF101, HMDB51 and Olympic Sports, show that our model performs favorably against state-of-the-art methods. Our codes are publicly available at https://github.com/junbaoZHUO/TSRL.
引用
收藏
页码:5761 / 5772
页数:12
相关论文
共 50 条
  • [31] Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification
    Li, Yansheng
    Kong, Deyu
    Zhang, Yongjun
    Tan, Yihua
    Chen, Ling
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2021, 179 : 145 - 158
  • [32] Learning visual-and-semantic knowledge embedding for zero-shot image classification
    Dehui Kong
    Xiliang Li
    Shaofan Wang
    Jinghua Li
    Baocai Yin
    Applied Intelligence, 2023, 53 : 2250 - 2264
  • [33] Learning visual-and-semantic knowledge embedding for zero-shot image classification
    Kong, Dehui
    Li, Xiliang
    Wang, Shaofan
    Li, Jinghua
    Yin, Baocai
    APPLIED INTELLIGENCE, 2023, 53 (02) : 2250 - 2264
  • [34] Distilling knowledge from multiple foundation models for zero-shot image classification
    Yin, Siqi
    Jiang, Lifan
    PLOS ONE, 2024, 19 (09):
  • [35] Generalized zero-shot learning for action recognition with web-scale video data
    Kun Liu
    Wu Liu
    Huadong Ma
    Wenbing Huang
    Xiongxiong Dong
    World Wide Web, 2019, 22 : 807 - 824
  • [36] Generalized zero-shot learning for action recognition with web-scale video data
    Liu, Kun
    Liu, Wu
    Ma, Huadong
    Huang, Wenbing
    Dong, Xiongxiong
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 807 - 824
  • [37] VidToMe: Video Token Merging for Zero-Shot Video Editing
    Li, Xirui
    Ma, Chao
    Yang, Xiaokang
    Yang, Ming-Hsuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 7486 - 7495
  • [38] Generating Visual Representations for Zero-Shot Classification
    Bucher, Maxime
    Herbin, Stephane
    Jurie, Frederic
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2666 - 2673
  • [39] Zero-Shot Taxonomy Mapping for Document Classification
    Bongiovanni, Lorenzo
    Bruno, Luca
    Dominici, Fabrizio
    Rizzo, Giuseppe
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 911 - 918
  • [40] ATTRIBUTE DRIVEN ZERO-SHOT CLASSIFICATION AND SEGMENTATION
    Yang, Shu
    Shi, Yemin
    Wang, Yaowei
    Wang, Jing
    Fei, Zesong
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW 2018), 2018,