Multigranularity Feature Aggregation and Cross-level Boundary Modeling for Temporal Action Detection

被引：0

作者：

Li, Qiang ^{[1
,2
]}

Liu, Di ^{[1
,3
]}

Zu, Guang ^{[4
]}

Li, Sen ^{[1
]}

Sun, Hui ^{[2
]}

Wang, Jianzhong ^{[1
]}

机构：

[1] Northeast Normal Univ, Sch Informat Sci & Technol, Changchun, Peoples R China

[2] Changchun Humanities & Sci Coll, Changchun, Peoples R China

[3] Northeast Elect Power Univ, Jilin, Peoples R China

[4] Jilin Univ, Sch Artificial Intelligence, Changchun, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2025年 / 21卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Temporal action detection; action recognition; vision transformers; TRANSFORMER;

D O I：

10.1145/3712598

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article presents a Temporal Action Detection (TAD) method with Multigranularity (MG) feature aggregation and Cross-level Boundary Modeling (CBM). Compared with other methods, our proposed approach has the following advantages. First, different from most existing works which only consider the local temporal context, a simple and computationally efficient MG module is proposed to comprehensively extract video features in instant, local, and global temporal granularities. Second, unlike the methods that only employ the information from single feature pyramid level for action boundary regression, a CBM strategy that integrates the relative information from both the same and higher level features is designed to improve the accuracy of boundary prediction. At lastfere, benefiting from the MG module and CBM strategy, our method outperforms other state-of-the-art approaches on five challenging TAD datasets: THUMOS14, MultiTHUMOS, EPIC-KITCHENS-100, ActivityNet-1.3, and HACS. We make our code and pre-trained model publicly available CCS Concepts: center dot Computing methodologies -> Artificial intelligence; Computer vision tasks; Activity recognition and understanding

引用

页数：24

共 50 条

[41] Multi-Level Content-Aware Boundary Detection for Temporal Action Proposal Generation
Su, Taiyi
Wang, Hanli
Wang, Lei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 6090 - 6101
[42] Spatial-Temporal Skeleton Feature: An Unit-Level Feature for Temporal Action Proposal Generation
Chen, Tingting
Dong, Junyu
Qi, Lin
Zhang, Shu
Wang, Xiang
Zhao, Qilu
2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 389 - 394
[43] Camouflaged Object Detection via Context-Aware Cross-Level Fusion
Chen, Geng
Liu, Si-Jie
Sun, Yu-Jia
Ji, Ge-Peng
Wu, Ya-Feng
Zhou, Tao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6981 - 6993
[44] Context-aware Cross-level Fusion Network for Camouflaged Object Detection
Sun, Yujia
Chen, Geng
Zhou, Tao
Zhang, Yi
Liu, Nian
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1025 - 1031
[45] Boundary graph convolutional network for temporal action detection
Chen, Yaosen
Guo, Bing
Shen, Yan
Wang, Wei
Lu, Weichen
Suo, Xinhua
IMAGE AND VISION COMPUTING, 2021, 109
[46] BTM: Boundary Trimming Module for Temporal Action Detection
Hamdi, Maher
Wen, Shiping
Yang, Yin
ELECTRONICS, 2022, 11 (21)
[47] Progressive Boundary Refinement Network for Temporal Action Detection
Liu, Qinying
Wang, Zilei
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11612 - 11619
[48] A Frame Level Feature Aggregation Method for Video target Detection
Guo, Jun
Liu, Wenfeng
Xin, Shijie
Zhao, Zixuan
Zhang, Bin
PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 1368 - 1373
[49] CAA: Candidate-Aware Aggregation for Temporal Action Detection
Ren, Yifan
Xu, Xing
Shen, Fumin
Yao, Yazhou
Lu, Huimin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4930 - 4938
[50] Feature Aggregation Tree: Capture Temporal Motion Information for Action Recognition in Videos
Zhu, Bing
PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 316 - 327

← 1 2 3 4 5 →