Accurate video saliency prediction via hierarchical fusion and temporal recurrence

被引:2
|
作者
Zhang, Yunzuo [1 ]
Zhang, Tian [1 ]
Wu, Cunyu [1 ]
Zheng, Yuxin [1 ]
机构
[1] Shijiazhuang Tiedao Univ, Sch Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Video saliency prediction; Hierarchical spatiotemporal feature; Temporal recurrence; 3D convolutional network; Attention mechanism; CONVOLUTIONAL NETWORKS; NEURAL-NETWORK; MODEL; EYE;
D O I
10.1016/j.imavis.2023.104744
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the ability to extract spatiotemporal features, 3D convolutional networks have become the mainstream method for Video Saliency Prediction (VSP). However, these methods cannot make full use of hierarchical spatio-temporal features and also lack focus on past salient features, which hinders further improvements in accuracy. To address these issues, we propose a 3D convolutional Network based on Hierarchical Fusion and Temporal Re-currence (HFTR-Net) for VSP. Specifically, we propose a Bi-directional Temporal-Spatial Feature Pyramid (BiTSFP), which adds the flow of shallow location information based on the previous flow of deep semantic infor-mation. Then, different from simple addition and concatenation, we design a Hierarchical Adaptive Fusion (HAF) mechanism that can adaptively learn the fusion weights of adjacent features to integrate them appropriately. Moreover, to utilize previous salient information, a Recall 3D convGRU (R3D GRU) module is integrated into the 3D convolution-based method for the first time. It subtly combines the local feature extraction of the 3D back-bone with the long-term relationship modeling of the temporal recurrence mechanism. Experimental results on the three common datasets demonstrate that the HFTR-Net outperforms existing state-of-the-art methods in accuracy.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Video saliency prediction via spatio-temporal reasoning
    Chen, Jiazhong
    Li, Zongyi
    Jin, Yi
    Ren, Dakai
    Ling, Hefei
    NEUROCOMPUTING, 2021, 462 : 59 - 68
  • [2] Superpixel-based video saliency detection via the fusion of spatiotemporal saliency and temporal coherency
    Li, Yandi
    Xu, Xiping
    Zhang, Ning
    Du, Enyu
    OPTICAL ENGINEERING, 2019, 58 (08)
  • [3] The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction
    Lazaridis, Nikos
    Georgiadis, Kostas
    Kalaganis, Fotis
    Kordopatis-Zilos, Giorgos
    Papadopoulos, Symeon
    Nikolopoulos, Spiros
    Kompatsiaris, Ioannis
    IEEE ACCESS, 2024, 12 : 129705 - 129716
  • [4] Accurate Object Segmentation for Video Sequences via Temporal-Spatial-Frequency Saliency Model
    Xu, Bing
    Niu, Yanxiong
    IEEE INTELLIGENT SYSTEMS, 2018, 33 (01) : 18 - 28
  • [5] Contrast Based Hierarchical Spatial-Temporal Saliency for Video
    Le, Trung-Nghia
    Sugimoto, Akihiro
    IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 734 - 748
  • [6] GFNet: gated fusion network for video saliency prediction
    Wu, Songhe
    Zhou, Xiaofei
    Sun, Yaoqi
    Gao, Yuhan
    Zhu, Zunjie
    Zhang, Jiyong
    Yan, Chenggang
    APPLIED INTELLIGENCE, 2023, 53 (22) : 27865 - 27875
  • [7] GFNet: gated fusion network for video saliency prediction
    Songhe Wu
    Xiaofei Zhou
    Yaoqi Sun
    Yuhan Gao
    Zunjie Zhu
    Jiyong Zhang
    Chenggang Yan
    Applied Intelligence, 2023, 53 : 27865 - 27875
  • [8] VIDEO RETARGETING WITH NONLINEAR SPATIAL-TEMPORAL SALIENCY FUSION
    Lu, Taoran
    Yuan, Zheng
    Huang, Yu
    Wu, Dapeng
    Yu, Heather
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 1801 - 1804
  • [9] Hierarchical spatiotemporal Feature Interaction Network for video saliency prediction
    Jin, Yingjie
    Zhou, Xiaofei
    Zhang, Zhenjie
    Fang, Hao
    Shi, Ran
    Xu, Xiaobin
    IMAGE AND VISION COMPUTING, 2025, 154
  • [10] Learning Coupled Convolutional Networks Fusion for Video Saliency Prediction
    Wu, Zhe
    Su, Li
    Huang, Qingming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) : 2960 - 2971