Temporal Deformable Transformer for Action Localization

被引：0

作者：

Wang, Haoying ^{[1
]}

Wei, Ping ^{[1
]}

Liu, Meiqin ^{[1
]}

Zheng, Nanning ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI | 2023年 / 14259卷

基金：

中国国家自然科学基金;

关键词：

Temporal Action Localization; Transformer; Deformable Attention; Video Understanding;

D O I：

10.1007/978-3-031-44223-0_45

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Temporal action localization (TAL) is a challenging task that has received significant attention in video understanding. Recently, Transformer-based models have demonstrated their effectiveness in capturing contextual information and achieved outstanding performance on various TAL benchmarks. However, these methods still face challenges in computational efficiency and contextual modeling rigidity. In this paper, we propose a method to address those problems in Transformer-based models. Our model introduces a temporal deformable Transformer module and the corresponding time normalization, enabling flexible aggregation of temporal context information in videos, leading to enhanced video representations. To demonstrate the effectiveness of the proposed method, we construct a Transformer-based anchor-free model with a simple prediction head, which yields superior performance on widely used benchmarks. Specifically, it achieves an average mAP of 67.4% on THUMOS14 and an average mAP of 36.8% on ActivityNet-v1.3.

引用

页码：563 / 575

页数：13

共 50 条

[21] Deformable graph convolutional transformer for skeleton-based action recognition
Shuo Chen
Ke Xu
Bo Zhu
Xinghao Jiang
Tanfeng Sun
Applied Intelligence, 2023, 53 : 15390 - 15406
[22] Precise Temporal Action Localization by Evolving Temporal Proposals
Qiu, Haonan
Zheng, Yingbin
Ye, Hao
Lu, Yao
Wang, Feng
He, Liang
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 388 - 396
[23] Relation Attention for Temporal Action Localization
Chen, Peihao
Gan, Chuang
Shen, Guangyao
Huang, Wenbing
Zeng, Runhao
Tan, Mingkui
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2723 - 2733
[24] Deformable Video Transformer
Wang, Jue
Torresani, Lorenzo
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14033 - 14042
[25] End-to-End Temporal Action Detection With Transformer
Liu, Xiaolong
Wang, Qimeng
Hu, Yao
Tang, Xu
Zhang, Shiwei
Bai, Song
Bai, Xiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
[26] Learning Disentangled Classification and Localization Representations for Temporal Action Localization
Zhu, Zixin
Wang, Le
Tang, Wei
Liu, Ziyi
Zheng, Nanning
Hua, Gang
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3644 - 3652
[27] Efficient temporal action localization with temporal attention and gaussian weight
Sun, Mengbo
Song, Yonghong
Wang, Hongda
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[28] Complementary Temporal Classification Activation Maps in Temporal Action Localization
Wang, Lijuan
Zhu, Suguo
Li, Zhihao
Fang, Zhenying
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2021, PT II, 2021, 13020 : 373 - 384
[29] MTSN: Multiscale Temporal Similarity Network for Temporal Action Localization
Jin, Xiaodong
Zhang, Taiping
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2573 - 2581
[30] RETHINKING TEMPORAL STRUCTURE MODELING METHOD FOR TEMPORAL ACTION LOCALIZATION
Li, Hongru
Yang, Jianxing
Zhou, Yuan
Li, Sumei
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3676 - 3680

← 1 2 3 4 5 →