Temporal Action Localization With Coarse-to-Fine Network

被引:2
|
作者
Zhang, Min [1 ]
Hu, Haiyang [2 ]
Li, Zhongjin [2 ]
机构
[1] Zhejiang Ind Polytech Coll, Dept Design & Art, Shaoxing 312000, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Peoples R China
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Learning systems; Videos; Location awareness; Transformers; Feature extraction; Logic gates; Temporal action localization; action detection; action granularity; progressive learning;
D O I
10.1109/ACCESS.2022.3205594
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Precisely localizing temporal intervals for each action segment in long raw videos is essential challenge in practical video content analysis (e.g., activity detection or video caption generation). Most of previous works often neglect the hierarchical action granularity and eventually fail to identify precise action boundaries. (e.g., embracing approaching or turning a screw in mechanical maintenance). In this paper, we introduce a simple yet efficient coarse-to-fine network (CFNet) to solve the challenging issue of temporal action localization by progressively refining action boundary at multiple action granularities. The proposed CFNet is mainly composed of three components: a coarse proposal module (CPM) to generate coarse action candidates, a fusion block (FB) to enhance feature representation by fusing the coarse candidate features and corresponding features of raw input frames, and a boundary transformer module (BTM) to further refine action boundaries. Specifically, CPM exploits framewise, matching and gated actionness curves to complement each other for coarse candidate generation at different levels, while FB is devised to enrich feature representation by fusing the last feature map of CPM and corresponding raw frame input. Finally, BTM learns long-term temporal dependency with a transformer structure to further refine action boundaries at a finer granularity. Thus, the fine-grained action intervals can be incrementally obtained. Compared with previous state-of-the-art techniques, the proposed coarse-to-fine network can asymptotically approach fine-grained action boundary. Comprehensive experiments are conducted on both publicly available THUMOS14 and ActivityNet-v1.3 datasets, and show the outstanding improvements of our method when compared with the prior methods on various video action parsing tasks.
引用
收藏
页码:96378 / 96387
页数:10
相关论文
共 50 条
  • [21] CasNet: A Cascade Coarse-to-Fine Network for Semantic Segmentation
    Wang, Zhenyang
    Deng, Zhidong
    Wang, Shiyao
    TSINGHUA SCIENCE AND TECHNOLOGY, 2019, 24 (02) : 207 - 215
  • [22] CFN: A coarse-to-fine network for eye fixation prediction
    Xu, Binwei
    Liang, Haoran
    Liang, Ronghua
    Chen, Peng
    IET IMAGE PROCESSING, 2022, 16 (09) : 2373 - 2383
  • [23] Coarse-to-fine Facial Landmarks Localization based on Convolutional Feature
    Li, Huifang
    Li, Yidong
    Liu, Wenhua
    Dong, Hairong
    PROCEEDINGS OF 4TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC ADVANCE IN BEHAVIORAL, ECONOMIC, SOCIOCULTURAL COMPUTING (BESC), 2017,
  • [24] Coarse-To-Fine Visual Localization Using Semantic Compact Map
    Liao, Ziwei
    Shi, Jieqi
    Qi, Xianyu
    Zhang, Xiaoyu
    Wang, Wei
    He, Yijia
    Liu, Xiao
    Wei, Ran
    2020 3RD INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTS (ICCR 2020), 2020, : 30 - 37
  • [25] Coarse-to-Fine Localization for Detecting Misalignment State of Angle Cocks
    Lei, Hengda
    Cao, Li
    Li, Xiuhua
    SENSORS, 2023, 23 (17)
  • [26] A coarse-to-fine capsule network for fine-grained image categorization
    Lin, Zhongqi
    Jia, Jingdun
    Huang, Feng
    Gao, Wanlin
    NEUROCOMPUTING, 2021, 456 : 200 - 219
  • [27] CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization
    Ding, Mingyu
    Wang, Zhe
    Sun, Jiankai
    Shi, Jianping
    Luo, Ping
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 2871 - 2880
  • [28] A coarse-to-fine capsule network for fine-grained image categorization
    College of Information and Electrical Engineering, China Agricultural University, Beijing
    100083, China
    不详
    100083, China
    不详
    100083, China
    Neurocomputing, 1600, (200-219):
  • [29] CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization
    Xiao, Min
    Zhu, Junnan
    Lin, Haitao
    Zhou, Yu
    Zong, Chengqing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 8538 - 8553
  • [30] A Context Knowledge Map Guided Coarse-to-Fine Action Recognition
    Ji, Yanli
    Zhan, Yue
    Yang, Yang
    Xu, Xing
    Shen, Fumin
    Shen, Heng Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 2742 - 2752