Temporal Deformable Transformer for Action Localization

被引:0
|
作者
Wang, Haoying [1 ]
Wei, Ping [1 ]
Liu, Meiqin [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Temporal Action Localization; Transformer; Deformable Attention; Video Understanding;
D O I
10.1007/978-3-031-44223-0_45
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal action localization (TAL) is a challenging task that has received significant attention in video understanding. Recently, Transformer-based models have demonstrated their effectiveness in capturing contextual information and achieved outstanding performance on various TAL benchmarks. However, these methods still face challenges in computational efficiency and contextual modeling rigidity. In this paper, we propose a method to address those problems in Transformer-based models. Our model introduces a temporal deformable Transformer module and the corresponding time normalization, enabling flexible aggregation of temporal context information in videos, leading to enhanced video representations. To demonstrate the effectiveness of the proposed method, we construct a Transformer-based anchor-free model with a simple prediction head, which yields superior performance on widely used benchmarks. Specifically, it achieves an average mAP of 67.4% on THUMOS14 and an average mAP of 36.8% on ActivityNet-v1.3.
引用
收藏
页码:563 / 575
页数:13
相关论文
共 50 条
  • [21] Deformable graph convolutional transformer for skeleton-based action recognition
    Shuo Chen
    Ke Xu
    Bo Zhu
    Xinghao Jiang
    Tanfeng Sun
    Applied Intelligence, 2023, 53 : 15390 - 15406
  • [22] Precise Temporal Action Localization by Evolving Temporal Proposals
    Qiu, Haonan
    Zheng, Yingbin
    Ye, Hao
    Lu, Yao
    Wang, Feng
    He, Liang
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 388 - 396
  • [23] Relation Attention for Temporal Action Localization
    Chen, Peihao
    Gan, Chuang
    Shen, Guangyao
    Huang, Wenbing
    Zeng, Runhao
    Tan, Mingkui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2723 - 2733
  • [24] Deformable Video Transformer
    Wang, Jue
    Torresani, Lorenzo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14033 - 14042
  • [25] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
  • [26] Learning Disentangled Classification and Localization Representations for Temporal Action Localization
    Zhu, Zixin
    Wang, Le
    Tang, Wei
    Liu, Ziyi
    Zheng, Nanning
    Hua, Gang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3644 - 3652
  • [27] Efficient temporal action localization with temporal attention and gaussian weight
    Sun, Mengbo
    Song, Yonghong
    Wang, Hongda
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [28] Complementary Temporal Classification Activation Maps in Temporal Action Localization
    Wang, Lijuan
    Zhu, Suguo
    Li, Zhihao
    Fang, Zhenying
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 373 - 384
  • [29] MTSN: Multiscale Temporal Similarity Network for Temporal Action Localization
    Jin, Xiaodong
    Zhang, Taiping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2573 - 2581
  • [30] RETHINKING TEMPORAL STRUCTURE MODELING METHOD FOR TEMPORAL ACTION LOCALIZATION
    Li, Hongru
    Yang, Jianxing
    Zhou, Yuan
    Li, Sumei
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3676 - 3680