Graph Convolutional Networks for Temporal Action Localization

被引:376
|
作者
Zeng, Runhao [1 ,2 ]
Huang, Wenbing [2 ,5 ]
Tan, Mingkui [1 ,4 ]
Rong, Yu [2 ]
Zhao, Peilin [2 ]
Huang, Junzhou [2 ]
Gan, Chuang [3 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[2] Tencent AI Lab, Shenzhen, Peoples R China
[3] MIT, IBM Watson AI Lab, Cambridge, MA 02139 USA
[4] Peng Cheng Lab, Shenzhen, Peoples R China
[5] Tsinghua Univ, State Key Lab Intelligent Technol & Syst, Tsinghua Natl Lab Informat Sci & Technol TNList, Dept Comp Sci & Technol, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV.2019.00719
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most state-of-the-art action localization systems process each action proposal individually, without explicitly exploiting their relations during learning. However, the relations between proposals actually play an important role in action localization, since a meaningful action always consists of multiple proposals in a video. In this paper, we propose to exploit the proposal-proposal relations using Graph Convolutional Networks (GCNs). First, we construct an action proposal graph, where each proposal is represented as a node and their relations between two proposals as an edge. Here, we use two types of relations, one for capturing the context information for each proposal and the other one for characterizing the correlations between distinct actions. Then we apply the GCNs over the graph to model the relations among different proposals and learn powerful representations for the action classification and localization. Experimental results show that our approach significantly outperforms the state-of-the-art on THUMOS14 (49.1% versus 42.8%). Moreover, augmentation experiments on ActivityNet also verify the efficacy of modeling action proposal relationships.
引用
收藏
页码:7093 / 7102
页数:10
相关论文
共 50 条
  • [1] Graph Convolutional Module for Temporal Action Localization in Videos
    Zeng, Runhao
    Huang, Wenbing
    Tan, Mingkui
    Rong, Yu
    Zhao, Peilin
    Huang, Junzhou
    Gan, Chuang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6209 - 6223
  • [2] Action Recognition Based on Spatial Temporal Graph Convolutional Networks
    Zheng, Wanqiang
    Jing, Punan
    Xu, Qingyang
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [3] Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation
    Ghosh, Pallabi
    Yao, Yi
    Davis, Larry S.
    Divakaran, Ajay
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 565 - 574
  • [4] Using BlazePose on Spatial Temporal Graph Convolutional Networks for Action Recognition
    Alsawadi, Motasem S.
    El-Kenawy, El-Sayed M.
    Rio, Miguel
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (01): : 19 - 36
  • [5] Deep Concept-wise Temporal Convolutional Networks for Action Localization
    Li, Xin
    Lin, Tianwei
    Liu, Xiao
    Zuo, Wangmeng
    Li, Chao
    Long, Xiang
    He, Dongliang
    Li, Fu
    Wen, Shilei
    Gan, Chuang
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4004 - 4012
  • [6] Temporal segment graph convolutional networks for skeleton-based action recognition
    Ding, Chongyang
    Wen, Shan
    Ding, Wenwen
    Liu, Kai
    Belyaev, Evgeny
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 110
  • [7] Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
    Yan, Sijie
    Xiong, Yuanjun
    Lin, Dahua
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7444 - 7452
  • [8] Involving Distinguished Temporal Graph Convolutional Networks for Skeleton-Based Temporal Action Segmentation
    Li, Yun-Heng
    Liu, Kai-Yuan
    Liu, Sheng-Lan
    Feng, Lin
    Qiao, Hong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 647 - 660
  • [9] CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
    Shou, Zheng
    Chan, Jonathan
    Zareian, Alireza
    Miyazawa, Kazuyuki
    Chang, Shih-Fu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1417 - 1426
  • [10] CGCN: Context graph convolutional network for few-shot temporal action localization
    Zhang, Shihui
    Wang, Houlin
    Wang, Lei
    Han, Xueqiang
    Tian, Qing
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)