Temporal Deformable Transformer for Action Localization

被引:0
|
作者
Wang, Haoying [1 ]
Wei, Ping [1 ]
Liu, Meiqin [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Temporal Action Localization; Transformer; Deformable Attention; Video Understanding;
D O I
10.1007/978-3-031-44223-0_45
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal action localization (TAL) is a challenging task that has received significant attention in video understanding. Recently, Transformer-based models have demonstrated their effectiveness in capturing contextual information and achieved outstanding performance on various TAL benchmarks. However, these methods still face challenges in computational efficiency and contextual modeling rigidity. In this paper, we propose a method to address those problems in Transformer-based models. Our model introduces a temporal deformable Transformer module and the corresponding time normalization, enabling flexible aggregation of temporal context information in videos, leading to enhanced video representations. To demonstrate the effectiveness of the proposed method, we construct a Transformer-based anchor-free model with a simple prediction head, which yields superior performance on widely used benchmarks. Specifically, it achieves an average mAP of 67.4% on THUMOS14 and an average mAP of 36.8% on ActivityNet-v1.3.
引用
收藏
页码:563 / 575
页数:13
相关论文
共 50 条
  • [41] Probabilistic Temporal Modeling for Unintentional Action Localization
    Xu, Jinglin
    Chen, Guangyi
    Zhou, Nuoxing
    Zheng, Wei-Shi
    Lu, Jiwen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 3081 - 3094
  • [42] Gaussian Temporal Awareness Networks for Action Localization
    Long, Fuchen
    Yao, Ting
    Qiu, Zhaofan
    Tian, Xinmei
    Luo, Jiebo
    Mei, Tao
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 344 - 353
  • [43] Action Shuffling for Weakly Supervised Temporal Localization
    Zhang, Xiao-Yu
    Shi, Haichao
    Li, Changsheng
    Shi, Xinchu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4447 - 4457
  • [44] Dual relation network for temporal action localization
    Xia, Kun
    Wang, Le
    Zhou, Sanping
    Hua, Gang
    Tang, Wei
    PATTERN RECOGNITION, 2022, 129
  • [45] Temporal Dropout for Weakly Supervised Action Localization
    Xie, Chi
    Zhuang, Zikun
    Zhao, Shengjie
    Liang, Shuang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [46] Frame Segmentation Networks for Temporal Action Localization
    Yang, Ke
    Qiao, Peng
    Wang, Qiang
    Li, Shijie
    Niu, Xin
    Li, Dongsheng
    Dou, Yong
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 242 - 252
  • [47] Temporal Superpixels based Human Action Localization
    Ullah, Sami
    Hassan, Najmul
    Bhatti, Naeem
    2018 14TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET), 2018,
  • [48] TVNet: Temporal Voting Network for Action Localization
    Wang, Hanyuan
    Damen, Dima
    Mirmehdi, Majid
    Perrett, Toby
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 550 - 558
  • [49] Revisiting Anchor Mechanisms for Temporal Action Localization
    Yang, Le
    Peng, Houwen
    Zhang, Dingwen
    Fu, Jianlong
    Han, Junwei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8535 - 8548
  • [50] A Temporal-Aware Relation and Attention Network for Temporal Action Localization
    Zhao, Yibo
    Zhang, Hua
    Gao, Zan
    Guan, Weili
    Nie, Jie
    Liu, Anan
    Wang, Meng
    Chen, Shengyong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4746 - 4760