Temporal Query Networks for Fine-grained Video Understanding

被引:50
|
作者
Zhang, Chuhan [1 ]
Gupta, Ankush [2 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Oxford, England
[2] DeepMind, London, England
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/CVPR46437.2021.00446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Our objective in this work is fine-grained classification of actions in untrimmed videos, where the actions may be temporally extended or may span only a few frames of the video. We cast this into a query-response mechanism, where each query addresses a particular question, and has its own response label set. We make the following four contributions: (i) We propose a new model-a Temporal Query Network-which enables the query-response functionality, and a structural understanding of fine-grained actions. It attends to relevant segments for each query with a temporal attention mechanism, and can be trained using only the labels for each query. (ii) We propose a new way-stochastic feature bank update-to train a network on videos of various lengths with the dense sampling required to respond to fine-grained queries. (iii) we compare the TQN to other architectures and text supervision methods, and analyze their pros and cons. Finally, (iv) we evaluate the method extensively on the FineGym and Diving48 benchmarks for fine-grained action classification and surpass the state-of-the-art using only RGB features. Project page: https://www.robots.ox.ac.uk/-vgg/research/tqn/.
引用
收藏
页码:4484 / 4494
页数:11
相关论文
共 50 条
  • [1] FiGO: Fine-Grained Query Optimization in Video Analytics
    Cao, Jiashen
    Sarkar, Karan
    Hadidi, Ramyad
    Arulraj, Joy
    Kim, Hyesoon
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 559 - 572
  • [2] Generating fine-grained surrogate temporal networks
    A. Longa
    G. Cencetti
    S. Lehmann
    A. Passerini
    B. Lepri
    Communications Physics, 7
  • [3] Generating fine-grained surrogate temporal networks
    Longa, A.
    Cencetti, G.
    Lehmann, S.
    Passerini, A.
    Lepri, B.
    COMMUNICATIONS PHYSICS, 2024, 7 (01)
  • [4] FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
    Liu, Yi
    Wang, Limin
    Wang, Yali
    Ma, Xiao
    Qiao, Yu
    IEEE Transactions on Image Processing, 2022, 31 : 6937 - 6950
  • [5] FineAction: A Fine-Grained Video Dataset for Temporal Action Localization
    Liu, Yi
    Wang, Limin
    Wang, Yali
    Ma, Xiao
    Qiao, Yu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6937 - 6950
  • [6] FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding
    Shao, Dian
    Zhao, Yue
    Dai, Bo
    Lin, Dahua
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2613 - 2622
  • [7] Fine-grained scalable video broadcasting over cellular networks
    Liu, JC
    Li, B
    Li, B
    Cao, XR
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 417 - 420
  • [8] WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-Grained Spatial-Temporal Understanding
    Kong, Quan
    Kawana, Yuki
    Saini, Rajat
    Kumar, Ashutosh
    Pan, Jingjing
    Gu, Ta
    Ozao, Yohei
    Opra, Balazs
    Sato, Yoichi
    Kobori, Norimasa
    COMPUTER VISION - ECCV 2024, PT LXXVI, 2025, 15134 : 1 - 18
  • [9] A Fine-Grained Spatial-Temporal Attention Model for Video Captioning
    Liu, An-An
    Qiu, Yurui
    Wong, Yongkang
    Su, Yu-Ting
    Kankanhalli, Mohan
    IEEE ACCESS, 2018, 6 : 68463 - 68471
  • [10] ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning
    Kordopatis-Zilos, Giorgos
    Papadopoulos, Symeon
    Patras, Ioannis
    Kompatsiaris, Ioannis
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6360 - 6369