Exploring Denoised Cross-video Contrast for Weakly-supervised Temporal Action Localization

被引:37
|
作者
Li, Jingjing [1 ]
Yang, Tianyu [2 ]
Ji, Wei [1 ]
Wang, Jue [2 ]
Cheng, Li [1 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Tencent AI Lab, Shenzhen, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1109/CVPR52688.2022.01929
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Weakly-supervised temporal action localization aims to localize actions in untrimmed videos with only video-level labels. Most existing methods address this problem with a "localization-by-classification" pipeline that localizes action regions based on snippet-wise classification sequences. Snippet-wise classifications are unfortunately error prone due to the sparsity of video-level labels. Inspired by recent success in unsupervised contrastive representation learning, we propose a novel denoised cross-video contrastive algorithm, aiming to enhance the feature discrimination ability of video snippets for accurate temporal action localization in the weakly-supervised setting. This is enabled by three key designs: I) an effective pseudo-label denoising module to alleviate the side effects caused by noisy contrastive features, 2) an efficient region-level feature contrast strategy with a region-level memory bank to capture "global" contrast across the entire dataset, and 3) a diverse contrastive learning strategy to enable action-background separation as well as intra-class compactness & inter-class separability. Extensive experiments on THUMOS14 and ActivityNet v1.3 demonstrate the superior performance of our approach.
引用
收藏
页码:19882 / 19892
页数:11
相关论文
共 50 条
  • [41] Weakly-supervised Action Localization with Background Modeling
    Phuc Xuan Nguyen
    Ramanan, Deva
    Fowlkes, Charless C.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5501 - 5510
  • [42] Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
    Gao, Junyu
    Chen, Mengyuan
    Xu, Changsheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19967 - 19977
  • [43] Weakly-supervised video object localization with attentive spatio-temporal correlation
    Wang, Mingui
    Cui, Di
    Wu, Lifang
    Jian, Meng
    Chen, Yukun
    Wang, Dong
    Liu, Xu
    PATTERN RECOGNITION LETTERS, 2021, 145 : 232 - 239
  • [44] Weakly-Supervised Video Object Grounding by Exploring Spatio-Temporal Contexts
    Yang, Xun
    Liu, Xueliang
    Jian, Meng
    Gao, Xinjian
    Wang, Meng
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1939 - 1947
  • [45] W-ART: ACTION RELATION TRANSFORMER FOR WEAKLY-SUPERVISED TEMPORAL ACTION LOCALIZATION
    Li, Mengzhu
    Wu, Hongjun
    Liu, Yongcheng
    Liu, Hongzhe
    Xu, Cheng
    Li, Xuewei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2195 - 2199
  • [46] Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization
    Zhou, Jianxiong
    Wu, Ying
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6017 - 6026
  • [47] Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization
    Moniruzzaman, Md
    Yin, Zhaozheng
    He, Zhihai
    Qin, Ruwen
    Leu, Ming C.
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2166 - 2174
  • [48] Attending to Distinctive Moments: Weakly-supervised Attention Models for Action Localization in Video
    Chen, Lei
    Zhai, Mengyao
    Mori, Greg
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 328 - 336
  • [49] Fusion detection network with discriminative enhancement for weakly-supervised temporal action localization
    Liu, Yuanyuan
    Zhu, Hong
    Ren, Haohao
    Shi, Jing
    Wang, Dong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [50] PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization
    Rizve, Mamshad Nayeem
    Mittal, Gaurav
    Yu, Ye
    Hall, Matthew
    Sajeev, Sandra
    Shah, Mubarak
    Chen, Mei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22992 - 23002