Multi-granularity hierarchical contrastive learning between foreground and background for semi-supervised video action detection

被引:0
|
作者
Zhang, Qiming [1 ]
Hu, Zhengping [1 ,2 ,3 ]
Wang, Yulu [1 ]
Zhang, Hehao [1 ]
Di, Jirui [1 ]
机构
[1] Yanshan Univ, Sch Informat & Engn, Qinhuangdao 066004, Hebei, Peoples R China
[2] Yanshan Univ, Qinhuangdao 066004, Hebei, Peoples R China
[3] Hebei Key Lab Informat Transmiss & Signal Proc, Qinhuangdao 066004, Hebei, Peoples R China
关键词
Semi-supervised learning; Video action detection; Multi-granularity; Contrastive learning; NETWORK;
D O I
10.1016/j.knosys.2024.112853
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised video action detection has received increasing attention due to its lower data annotation cost and performance comparable to fully supervised methods. However, due to the presence of dynamic background regions in the video, existing methods may encounter biases when interpreting the foreground and background of the video. This bias causes the model to mistakenly identify dynamic background areas as action foregrounds or to overlook background information, leading to misjudgment of the foreground. In response to this issue, this paper proposes a multi-granularity hierarchical contrastive learning between foreground and background for semi-supervised video action detection method termed as Multi-FB. Specifically, this paper proposes a multi- granularity encoding network based on foreground and background. This network uses a unified encoder to represent and learn foreground and background regions in videos at different granularities, thereby improving the model's understanding of action foreground and related background. Secondly, this paper proposes an Intramodel multi-granularity hierarchical contrastive strategy, which aims to minimize the representation discrepancies of foreground-to-foreground and background-to-background at different granularities within the same video, while maximizing the representation differences between the foreground and background at various granularities within the video. Furthermore, this paper proposes a Cross-model multi-granularity hierarchical contrastive strategy, which aims to enhance the consistency of the model's representations of foregrounds and backgrounds between the original data and the augmented data. A large number of experimental results on JHMDB-21 and UCF101-24 show that the proposed method can significantly distinguish feature representations between different categories, achieving performance comparable to state-of-the-art methods under semi- supervised conditions.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton-Based Action Recognition
    Shu, Xiangbo
    Xu, Binqian
    Zhang, Liyan
    Tang, Jinhui
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7559 - 7576
  • [2] Multi-Granularity Contrastive Learning for Graph with Hierarchical Pooling
    Liu, Peishuo
    Zhou, Cangqi
    Liu, Xiao
    Zhang, Jing
    Li, Qianmu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 499 - 511
  • [3] Semi-supervised Learning for Multi-label Video Action Detection
    Zhang, Hongcheng
    Zhao, Xu
    Wang, Dongqi
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2124 - 2134
  • [4] Multimodal semi-supervised learning for online recognition of multi-granularity surgical workflows
    Yamada, Yutaro
    Colan, Jacinto
    Davila, Ana
    Hasegawa, Yasuhisa
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024, 19 (06) : 1075 - 1083
  • [5] A Multi-granularity Contrastive Learning for Distantly Supervised Relation Extraction
    Jian, Zhaorui
    Liu, Shenquan
    Yin, Huixin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024, 2024, 14875 : 352 - 364
  • [6] Semi-supervised Active Learning for Video Action Detection
    Singh, Ayush
    Rana, Aayush J.
    Kumar, Akash
    Vyas, Shruti
    Rawat, Yogesh Singh
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4891 - 4899
  • [7] Hierarchical Semi-supervised Contrastive Learning for Contamination-Resistant Anomaly Detection
    Wang, Gaoang
    Zhan, Yibing
    Wang, Xinchao
    Song, Mingli
    Nahrstedt, Klara
    COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 110 - 128
  • [8] Semi-Supervised Action Recognition with Temporal Contrastive Learning
    Singh, Ankit
    Chakraborty, Omprakash
    Varshney, Ashutosh
    Panda, Rameswar
    Feris, Rogerio
    Saenko, Kate
    Das, Abir
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 10384 - 10394
  • [9] Multi-granularity Distillation Scheme Towards Lightweight Semi-supervised Semantic Segmentation
    Qin, Jie
    Wu, Jie
    Li, Ming
    Xiao, Xuefeng
    Zheng, Min
    Wang, Xingang
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 481 - 498
  • [10] A Semi-Supervised Paraphrase Identification Model Based on Multi-Granularity Interaction Reasoning
    Li, Xu
    Zeng, Fanxu
    Yao, Chunlong
    IEEE ACCESS, 2020, 8 : 60790 - 60800