Temporal Action Localization With Coarse-to-Fine Network

被引:2
|
作者
Zhang, Min [1 ]
Hu, Haiyang [2 ]
Li, Zhongjin [2 ]
机构
[1] Zhejiang Ind Polytech Coll, Dept Design & Art, Shaoxing 312000, Peoples R China
[2] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou 310018, Peoples R China
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Learning systems; Videos; Location awareness; Transformers; Feature extraction; Logic gates; Temporal action localization; action detection; action granularity; progressive learning;
D O I
10.1109/ACCESS.2022.3205594
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Precisely localizing temporal intervals for each action segment in long raw videos is essential challenge in practical video content analysis (e.g., activity detection or video caption generation). Most of previous works often neglect the hierarchical action granularity and eventually fail to identify precise action boundaries. (e.g., embracing approaching or turning a screw in mechanical maintenance). In this paper, we introduce a simple yet efficient coarse-to-fine network (CFNet) to solve the challenging issue of temporal action localization by progressively refining action boundary at multiple action granularities. The proposed CFNet is mainly composed of three components: a coarse proposal module (CPM) to generate coarse action candidates, a fusion block (FB) to enhance feature representation by fusing the coarse candidate features and corresponding features of raw input frames, and a boundary transformer module (BTM) to further refine action boundaries. Specifically, CPM exploits framewise, matching and gated actionness curves to complement each other for coarse candidate generation at different levels, while FB is devised to enrich feature representation by fusing the last feature map of CPM and corresponding raw frame input. Finally, BTM learns long-term temporal dependency with a transformer structure to further refine action boundaries at a finer granularity. Thus, the fine-grained action intervals can be incrementally obtained. Compared with previous state-of-the-art techniques, the proposed coarse-to-fine network can asymptotically approach fine-grained action boundary. Comprehensive experiments are conducted on both publicly available THUMOS14 and ActivityNet-v1.3 datasets, and show the outstanding improvements of our method when compared with the prior methods on various video action parsing tasks.
引用
收藏
页码:96378 / 96387
页数:10
相关论文
共 50 条
  • [1] Temporal Action Localization With Coarse-to-Fine Network
    Zhejiang Industry Polytechnic College, Department of Design and Art, Shaoxing
    312000, China
    不详
    310018, China
    IEEE Access, 2022, (96378-96387)
  • [2] Coarse-to-Fine Localization of Temporal Action Proposals
    Long, Fuchen
    Yao, Ting
    Qiu, Zhaofan
    Tian, Xinmei
    Mei, Tao
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (06) : 1577 - 1590
  • [3] A Coarse-to-Fine Boundary Localization method for Naturalistic Driving Action Recognition
    Ding, Guanchen
    Han, Wenwei
    Wang, Chenglong
    Cui, Mingpeng
    Zhou, Lin
    Pan, Dianbo
    Wang, Jiayi
    Zhang, Junxi
    Chen, Zhenzhong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3233 - 3240
  • [4] Exploring Coarse-to-Fine Action Token Localization and Interaction for Fine-grained Video Action Recognition
    Sun, Baoli
    Ye, Xinchen
    Wang, Zhihui
    Li, Haojie
    Wang, Zhiyong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5070 - 5078
  • [5] Extensive Facial Landmark Localization with Coarse-to-fine Convolutional Network Cascade
    Zhou, Erjin
    Fan, Haoqiang
    Cao, Zhimin
    Jiang, Yuning
    Yin, Qi
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 386 - 391
  • [6] A Coarse-to-Fine Network for Craniopharyngioma Segmentation
    Yu, Yijie
    Zhang, Lei
    Shu, Xin
    Wang, Zizhou
    Chen, Chaoyue
    Xu, Jianguo
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2022, 2022, 13583 : 91 - 100
  • [7] A coarse-to-fine temporal action detection method combining light and heavy networks
    Zhao, Fan
    Wang, Wen
    Wu, Yu
    Wang, Kaixuan
    Kang, Xiaobing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (01) : 879 - 898
  • [8] Iris localization with dual coarse-to-fine strategy
    Feng, Xinhua
    Fang, Chi
    Ding, Xiaoqing
    Wu, Youshou
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 553 - +
  • [9] A coarse-to-fine temporal action detection method combining light and heavy networks
    Fan Zhao
    Wen Wang
    Yu Wu
    Kaixuan Wang
    Xiaobing Kang
    Multimedia Tools and Applications, 2023, 82 : 879 - 898
  • [10] Coarse-to-Fine Network for Crowd Counting
    Sun, Zhiyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 1342 - 1346