Temporal Action Localization With Coarse-to-Fine Network

被引:2
|
作者
Zhang, Min [1 ]
Hu, Haiyang [2 ]
Li, Zhongjin [2 ]
机构
[1] Zhejiang Ind Polytech Coll, Dept Design & Art, Shaoxing 312000, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Peoples R China
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Learning systems; Videos; Location awareness; Transformers; Feature extraction; Logic gates; Temporal action localization; action detection; action granularity; progressive learning;
D O I
10.1109/ACCESS.2022.3205594
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Precisely localizing temporal intervals for each action segment in long raw videos is essential challenge in practical video content analysis (e.g., activity detection or video caption generation). Most of previous works often neglect the hierarchical action granularity and eventually fail to identify precise action boundaries. (e.g., embracing approaching or turning a screw in mechanical maintenance). In this paper, we introduce a simple yet efficient coarse-to-fine network (CFNet) to solve the challenging issue of temporal action localization by progressively refining action boundary at multiple action granularities. The proposed CFNet is mainly composed of three components: a coarse proposal module (CPM) to generate coarse action candidates, a fusion block (FB) to enhance feature representation by fusing the coarse candidate features and corresponding features of raw input frames, and a boundary transformer module (BTM) to further refine action boundaries. Specifically, CPM exploits framewise, matching and gated actionness curves to complement each other for coarse candidate generation at different levels, while FB is devised to enrich feature representation by fusing the last feature map of CPM and corresponding raw frame input. Finally, BTM learns long-term temporal dependency with a transformer structure to further refine action boundaries at a finer granularity. Thus, the fine-grained action intervals can be incrementally obtained. Compared with previous state-of-the-art techniques, the proposed coarse-to-fine network can asymptotically approach fine-grained action boundary. Comprehensive experiments are conducted on both publicly available THUMOS14 and ActivityNet-v1.3 datasets, and show the outstanding improvements of our method when compared with the prior methods on various video action parsing tasks.
引用
收藏
页码:96378 / 96387
页数:10
相关论文
共 50 条
  • [31] COARSE-TO-FINE AGGREGATION FOR CROSS-GRANULARITY ACTION RECOGNITION
    Mazari, Ahmed
    Sahbi, Hichem
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1541 - 1545
  • [32] HUMAN-AWARE COARSE-TO-FINE ONLINE ACTION DETECTION
    Yang, Zichen
    Huang, Di
    Qin, Jie
    Wang, Yunhong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2455 - 2459
  • [33] Coarse-to-Fine Spatial-Temporal Relationship Inference for Temporal Sentence Grounding
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    Huang, Yongming
    Yang, Luxi (lxyang@seu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc. (09): : 97430 - 97443
  • [34] Coarse-to-Fine Spatial-Temporal Relationship Inference for Temporal Sentence Grounding
    Qi, Shanshan
    Yang, Luxi
    Li, Chunguo
    Huang, Yongming
    IEEE ACCESS, 2021, 9 : 97430 - 97443
  • [35] Coarse-to-fine multiscale fusion network for single image deraining
    Zhang, Jiahao
    Zhang, Juan
    Wu, Xing
    Shi, Zhicai
    Hwang, Jenq-Neng
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)
  • [36] Cancer metastasis fast location based on coarse-to-fine network
    Wang, Rui
    Gu, Yun
    Yang, Jie
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 223 - 227
  • [37] Coarse-to-Fine Multi-camera Network Topology Estimation
    Xing, Chang
    Bai, Sichen
    Zhou, Yi
    Zhou, Zhong
    Wu, Wei
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 981 - 990
  • [38] Coarse-to-Fine Deep Neural Network for Fast Pedestrian Detection
    Li, Yaobin
    Yang, Xinmei
    Cao, Lijun
    LIDAR IMAGING DETECTION AND TARGET RECOGNITION 2017, 2017, 10605
  • [39] A Coarse-to-Fine Dual Attention Network for Blind Face Completion
    Hoermann, Stefan
    Xia, Zhibing
    Knoche, Martin
    Rigoll, Gerhard
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [40] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
    Hu, Yupeng
    Nie, Liqiang
    Liu, Meng
    Wang, Kun
    Wang, Yinglong
    Hua, Xian-Sheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943