Zero-shot Video Classification with Appropriate Web and Task Knowledge Transfer

被引:7
|
作者
Zhuo, Junbao [1 ]
Zhu, Yan [2 ]
Cui, Shuhao [3 ]
Wang, Shuhui [1 ,4 ]
Ma, Bin [3 ]
Huang, Qingming [1 ,2 ]
Wei, Xiaoming [3 ]
Wei, Xiaolin [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Meituan Inc, Beijing, Peoples R China
[4] Peng Cheng Lab, Shenzhen, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Zero-shot Video Classification; Transfer Learning;
D O I
10.1145/3503161.3548008
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Zero-shot video classification (ZSVC) that aims to recognize video classes that have never been seen during model training, has become a thriving research direction. ZSVC is achieved by building mappings between visual and semantic embeddings. Recently, ZSVC has been achieved by automatically mining the underlying objects in videos as attributes and incorporating external commonsense knowledge. However, the object mined from seen categories can not generalized to unseen ones. Besides, the category-object relationships are usually extracted from commonsense knowledge or word embedding, which is not consistent with video modality. To tackle these issues, we propose to mine associated objects and category-object relationships for each category from retrieved web images. The associated objects of all categories are employed as generic attributes and the mined category-object relationships could narrow the modality inconsistency for better knowledge transfer. Another issue of existing ZSVC methods is that the model sufficiently trained with labeled seen categories may not generalize well to distinct unseen categories. To encourage a more reliable transfer, we propose Task Similarity aware Representation Learning (TSRL). In TSRL, the similarity between seen categories and the unseen ones is estimated and used to regularize the model in an appropriate way. We construct a model for ZSVC based on the constructed attributes, the mined category-object relationships and the proposed TSRL. Experimental results on four public datasets, i.e., FCVID, UCF101, HMDB51 and Olympic Sports, show that our model performs favorably against state-of-the-art methods. Our codes are publicly available at https://github.com/junbaoZHUO/TSRL.
引用
收藏
页码:5761 / 5772
页数:12
相关论文
共 50 条
  • [1] Zero-Shot Task Transfer
    Pal, Arghya
    Balasubramanian, Vineeth N.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2184 - 2193
  • [2] Relational Knowledge Transfer for Zero-Shot Learning
    Wang, Donghui
    Li, Yanan
    Lin, Yuetan
    Zhuang, Yueting
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2145 - 2151
  • [3] Learning to Model Relationships for Zero-Shot Video Classification
    Gao, Junyu
    Zhang, Tianzhu
    Xu, Changsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3476 - 3491
  • [4] Micro-Knowledge Embedding for Zero-shot Classification
    Li, Houjun
    Wang, Fang
    Liu, Jingxian
    Huang, Jianhua
    Zhang, Ting
    Yang, Shuhong
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [5] Zero-shot Learning via Recurrent Knowledge Transfer
    Zhao, Bo
    Sun, Xinwei
    Hong, Xiaopeng
    Yao, Yuan
    Wang, Yizhou
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1308 - 1317
  • [6] Zero-shot transfer for implicit discourse relation classification
    Kurfali, Murathan
    Ostling, Robert
    20TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2019), 2019, : 226 - 231
  • [7] Video Attribute Prototype Network: A New Perspective for Zero-Shot Video Classification
    Wang, Bo
    Zhao, Kaili
    Zhao, Hongyang
    Pu, Shi
    Xiao, Bo
    Guo, Jun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 315 - 324
  • [8] Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
    Zhang, Jingqing
    Lertvittayakumjorn, Piyawat
    Guo, Yike
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1031 - 1040
  • [9] Zero-shot classification by transferring knowledge and preserving data structure
    Li, Xiao
    Fang, Min
    Wu, Jinqiao
    NEUROCOMPUTING, 2017, 238 : 76 - 83
  • [10] Zero-shot Knowledge Transfer via Adversarial Belief Matching
    Micaelli, Paul
    Storkey, Amos
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32