C2F: An effective coarse-to-fine network for video summarization

被引:4
|
作者
Jin, Ye [1 ]
Tian, Xiaoyan [1 ]
Zhang, Zhao [2 ]
Liu, Peng [1 ]
Tang, Xianglong [1 ]
机构
[1] Harbin Inst Technol, Fac Comp, Harbin 150001, Peoples R China
[2] Harbin Inst Technol, Sch Instrument Sci & Engn, Harbin 150001, Peoples R China
基金
中国国家自然科学基金; 黑龙江省自然科学基金;
关键词
Video summarization; Coarse -to -fine network; Multiscale representation; Local adaptive loss;
D O I
10.1016/j.imavis.2024.104962
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of video summarization is to develop a concise and condensed summary that accurately captures the original video content. The methods currently used to summarize supervised videos and consider the task a sequence-to-sequence problem. However, modeling the order of long videos presents three challenges: (1) capturing both local and global relationships simultaneously is challenging; (2) the boundaries of video highlight segments are often incorrectly located, indicating that semantic integrity is incomplete; (3) efficient relation computing is difficult to do well. We design a novel coarse-to-fine network (C2F) for video summarization adapted to the multi-level semantic video structure, thus addressing these limitations. The multiscale representation scheme initially captures different scales of temporal relationships for the coarse classification results; Meanwhile, the action-wise proposal module is intended to provide the fine prediction of importance scores and regress the temporal locations of key-frames. In addition, a loss function is proposed to identify local differences among frames and analyze combinations of various loss functions. Extensive experimental results on two benchmark datasets have demonstrated that the proposed C2F achieves significant performance compared with state-of-the-art methods, and performs satisfactorily in efficient relation computing. For example, on the TVSum dataset, we improve the F-score from 69.4% to 72.8% by 3.4%. Furthermore, C2F includes 4.7 M parameters, accounting for only 10.7% of the parameters used in the SASUM model.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] C2F: Coarse-to-fine vision control system for automated microassembly
    Tripathi S.
    Jain D.R.
    Sharma H.D.
    Nanoscience and Nanotechnology - Asia, 2019, 9 (02): : 229 - 239
  • [2] CFSum: A Coarse-to-Fine Contribution Network for Multimodal Summarization
    Xiao, Min
    Zhu, Junnan
    Lin, Haitao
    Zhou, Yu
    Zong, Chengqing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 8538 - 8553
  • [3] Robot learning through observation via coarse-to-fine grained video summarization
    Zhang, Yujia
    Li, Qianzhong
    Zhao, Xiaoguang
    Tan, Min
    APPLIED SOFT COMPUTING, 2021, 99
  • [4] A Coarse-to-Fine Training Paradigm for Dialogue Summarization
    Liu, Zhiyue
    Wang, Zhaoyang
    Wang, Jiahai
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 416 - 427
  • [5] COARSE-TO-FINE VIDEO TEXT DETECTION
    Miao, Guangyi
    Huang, Qingming
    Jiang, Shuqiang
    Gao, Wen
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 569 - +
  • [6] C2F-Net: Coarse-to-Fine Multidrone Collaborative Perception Network for Object Trajectory Prediction
    Chen, Mingxin
    Wang, Zhirui
    Wang, Zhechao
    Zhao, Liangjin
    Cheng, Peirui
    Wang, Hongqi
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6314 - 6328
  • [7] Coarse-to-Fine Query Focused Multi-Document Summarization
    Xu, Yumo
    Lapata, Mirella
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3632 - 3645
  • [8] A Coarse-to-Fine Framework for Automatic Video Unscreen
    Rao, Anyi
    Xu, Linning
    Li, Zhizhong
    Huang, Qingqiu
    Kuang, Zhanghui
    Zhang, Wayne
    Lin, Dahua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2723 - 2733
  • [9] A Coarse-to-Fine Network for Craniopharyngioma Segmentation
    Yu, Yijie
    Zhang, Lei
    Shu, Xin
    Wang, Zizhou
    Chen, Chaoyue
    Xu, Jianguo
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2022, 2022, 13583 : 91 - 100
  • [10] Coarse-to-Fine Network for Crowd Counting
    Sun, Zhiyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 1342 - 1346