Temporal Action Localization With Coarse-to-Fine Network

被引：2

作者：

Zhang, Min ^{[1
]}

Hu, Haiyang ^{[2
]}

Li, Zhongjin ^{[2
]}

机构：

[1] Zhejiang Ind Polytech Coll, Dept Design & Art, Shaoxing 312000, Peoples R China

[2] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Peoples R China

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Learning systems; Videos; Location awareness; Transformers; Feature extraction; Logic gates; Temporal action localization; action detection; action granularity; progressive learning;

D O I：

10.1109/ACCESS.2022.3205594

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Precisely localizing temporal intervals for each action segment in long raw videos is essential challenge in practical video content analysis (e.g., activity detection or video caption generation). Most of previous works often neglect the hierarchical action granularity and eventually fail to identify precise action boundaries. (e.g., embracing approaching or turning a screw in mechanical maintenance). In this paper, we introduce a simple yet efficient coarse-to-fine network (CFNet) to solve the challenging issue of temporal action localization by progressively refining action boundary at multiple action granularities. The proposed CFNet is mainly composed of three components: a coarse proposal module (CPM) to generate coarse action candidates, a fusion block (FB) to enhance feature representation by fusing the coarse candidate features and corresponding features of raw input frames, and a boundary transformer module (BTM) to further refine action boundaries. Specifically, CPM exploits framewise, matching and gated actionness curves to complement each other for coarse candidate generation at different levels, while FB is devised to enrich feature representation by fusing the last feature map of CPM and corresponding raw frame input. Finally, BTM learns long-term temporal dependency with a transformer structure to further refine action boundaries at a finer granularity. Thus, the fine-grained action intervals can be incrementally obtained. Compared with previous state-of-the-art techniques, the proposed coarse-to-fine network can asymptotically approach fine-grained action boundary. Comprehensive experiments are conducted on both publicly available THUMOS14 and ActivityNet-v1.3 datasets, and show the outstanding improvements of our method when compared with the prior methods on various video action parsing tasks.

引用

页码：96378 / 96387

页数：10

共 50 条

[31] COARSE-TO-FINE AGGREGATION FOR CROSS-GRANULARITY ACTION RECOGNITION
Mazari, Ahmed
Sahbi, Hichem
2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1541 - 1545
[32] HUMAN-AWARE COARSE-TO-FINE ONLINE ACTION DETECTION
Yang, Zichen
Huang, Di
Qin, Jie
Wang, Yunhong
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2455 - 2459
[33] Coarse-to-Fine Spatial-Temporal Relationship Inference for Temporal Sentence Grounding
Qi, Shanshan
Yang, Luxi
Li, Chunguo
Huang, Yongming
Yang, Luxi (lxyang@seu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (09): : 97430 - 97443
[34] Coarse-to-Fine Spatial-Temporal Relationship Inference for Temporal Sentence Grounding
Qi, Shanshan
Yang, Luxi
Li, Chunguo
Huang, Yongming
IEEE ACCESS, 2021, 9 : 97430 - 97443
[35] Coarse-to-fine multiscale fusion network for single image deraining
Zhang, Jiahao
Zhang, Juan
Wu, Xing
Shi, Zhicai
Hwang, Jenq-Neng
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
[36] Cancer metastasis fast location based on coarse-to-fine network
Wang, Rui
Gu, Yun
Yang, Jie
2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 223 - 227
[37] Coarse-to-Fine Multi-camera Network Topology Estimation
Xing, Chang
Bai, Sichen
Zhou, Yi
Zhou, Zhong
Wu, Wei
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 981 - 990
[38] Coarse-to-Fine Deep Neural Network for Fast Pedestrian Detection
Li, Yaobin
Yang, Xinmei
Cao, Lijun
LIDAR IMAGING DETECTION AND TARGET RECOGNITION 2017, 2017, 10605
[39] A Coarse-to-Fine Dual Attention Network for Blind Face Completion
Hoermann, Stefan
Xia, Zhibing
Knoche, Martin
Rigoll, Gerhard
2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
[40] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
Hu, Yupeng
Nie, Liqiang
Liu, Meng
Wang, Kun
Wang, Yinglong
Hua, Xian-Sheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943

← 1 2 3 4 5 →