Temporal Deformable Transformer for Action Localization

被引:0
|
作者
Wang, Haoying [1 ]
Wei, Ping [1 ]
Liu, Meiqin [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Temporal Action Localization; Transformer; Deformable Attention; Video Understanding;
D O I
10.1007/978-3-031-44223-0_45
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal action localization (TAL) is a challenging task that has received significant attention in video understanding. Recently, Transformer-based models have demonstrated their effectiveness in capturing contextual information and achieved outstanding performance on various TAL benchmarks. However, these methods still face challenges in computational efficiency and contextual modeling rigidity. In this paper, we propose a method to address those problems in Transformer-based models. Our model introduces a temporal deformable Transformer module and the corresponding time normalization, enabling flexible aggregation of temporal context information in videos, leading to enhanced video representations. To demonstrate the effectiveness of the proposed method, we construct a Transformer-based anchor-free model with a simple prediction head, which yields superior performance on widely used benchmarks. Specifically, it achieves an average mAP of 67.4% on THUMOS14 and an average mAP of 36.8% on ActivityNet-v1.3.
引用
收藏
页码:563 / 575
页数:13
相关论文
共 50 条
  • [31] Exploring Temporal Preservation Networks for Precise Temporal Action Localization
    Yang, Ke
    Qiao, Peng
    Li, Dongsheng
    Lv, Shaohe
    Dou, Yong
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7477 - 7484
  • [32] Spatio-Temporal Deformable DETR for Weakly Supervised Defect Localization
    Kim, Young-Min
    Yoo, Yong-Ho
    Yoon, In-Ug
    Myung, Hyun
    Kim, Jong-Hwan
    IEEE SENSORS JOURNAL, 2023, 23 (17) : 19935 - 19945
  • [33] ACTION COHERENCE NETWORK FOR WEAKLY SUPERVISED TEMPORAL ACTION LOCALIZATION
    Zhai, Yuanhao
    Wang, Le
    Liu, Ziyi
    Zhang, Qilin
    Hua, Gang
    Zheng, Nanning
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3696 - 3700
  • [34] Action Tubelet Detector for Spatio-Temporal Action Localization
    Kalogeiton, Vicky
    Weinzaepfel, Philippe
    Ferrari, Vittorio
    Schmid, Cordelia
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4415 - 4423
  • [35] Temporal Driver Action Localization using Action Classification Methods
    Alyahya, Munirah
    Alghannam, Shahad
    Alhussan, Taghreed
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3318 - 3325
  • [36] Weakly supervised temporal action localization: a survey
    Li, Ronglu
    Zhang, Tianyi
    Zhang, Rubo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 78361 - 78386
  • [37] Graph Convolutional Networks for Temporal Action Localization
    Zeng, Runhao
    Huang, Wenbing
    Tan, Mingkui
    Rong, Yu
    Zhao, Peilin
    Huang, Junzhou
    Gan, Chuang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7093 - 7102
  • [38] DANet: Temporal Action Localization with Double Attention
    Sun, Jianing
    Wu, Xuan
    Xiao, Yubin
    Wu, Chunguo
    Liang, Yanchun
    Liang, Yi
    Wang, Liupu
    Zhou, You
    APPLIED SCIENCES-BASEL, 2023, 13 (12):
  • [39] Temporal Action Localization by Structured Maximal Sums
    Yuan, Zehuan
    Stroud, Jonathan C.
    Lu, Tong
    Deng, Jia
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3215 - 3223
  • [40] Action recognition and localization with spatial and temporal contexts
    Xu, Wanru
    Miao, Zhenjiang
    Yu, Jian
    Ji, Qiang
    NEUROCOMPUTING, 2019, 333 : 351 - 363