Accurate video saliency prediction via hierarchical fusion and temporal recurrence

被引:2
|
作者
Zhang, Yunzuo [1 ]
Zhang, Tian [1 ]
Wu, Cunyu [1 ]
Zheng, Yuxin [1 ]
机构
[1] Shijiazhuang Tiedao Univ, Sch Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China
基金
中国国家自然科学基金;
关键词
Video saliency prediction; Hierarchical spatiotemporal feature; Temporal recurrence; 3D convolutional network; Attention mechanism; CONVOLUTIONAL NETWORKS; NEURAL-NETWORK; MODEL; EYE;
D O I
10.1016/j.imavis.2023.104744
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the ability to extract spatiotemporal features, 3D convolutional networks have become the mainstream method for Video Saliency Prediction (VSP). However, these methods cannot make full use of hierarchical spatio-temporal features and also lack focus on past salient features, which hinders further improvements in accuracy. To address these issues, we propose a 3D convolutional Network based on Hierarchical Fusion and Temporal Re-currence (HFTR-Net) for VSP. Specifically, we propose a Bi-directional Temporal-Spatial Feature Pyramid (BiTSFP), which adds the flow of shallow location information based on the previous flow of deep semantic infor-mation. Then, different from simple addition and concatenation, we design a Hierarchical Adaptive Fusion (HAF) mechanism that can adaptively learn the fusion weights of adjacent features to integrate them appropriately. Moreover, to utilize previous salient information, a Recall 3D convGRU (R3D GRU) module is integrated into the 3D convolution-based method for the first time. It subtly combines the local feature extraction of the 3D back-bone with the long-term relationship modeling of the temporal recurrence mechanism. Experimental results on the three common datasets demonstrate that the HFTR-Net outperforms existing state-of-the-art methods in accuracy.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction
    Bak, Cagdas
    Kocak, Aysun
    Erdem, Erkut
    Erdem, Aykut
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (07) : 1688 - 1698
  • [32] Aeroelastic force prediction via temporal fusion transformers
    Cid Montoya, Miguel
    Mishra, Ashutosh
    Verma, Sumit
    Mures, Omar A.
    Rubio-Medrano, Carlos E.
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2024,
  • [33] Temporal Fusion Transformers for streamflow Prediction: Value of combining attention with recurrence
    Koya, Sinan Rasiya
    Roy, Tirthankar
    JOURNAL OF HYDROLOGY, 2024, 637
  • [34] Video attention prediction using gaze saliency
    Yanxiang Chen
    Gang Tao
    Qiangqiang Xie
    Minglong Song
    Multimedia Tools and Applications, 2019, 78 : 26867 - 26884
  • [35] Video saliency detection using dynamic fusion of spatial-temporal features in complex background with disturbance
    Wu, Xiaofeng (xiaofengwu@fudan.edu.cn), 2016, Institute of Computing Technology (28):
  • [36] Prediction of visual saliency in video with deep CNNs
    Chaabouni, Souad
    Benois-Pineau, Jenny
    Hadar, Ofer
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIX, 2016, 9971
  • [37] VIDEO SALIENCY BASED ON RARITY PREDICTION: HYPERAPTOR
    Cassagne, Ioannis
    Riche, Nicolas
    Decombas, Marc
    Mancas, Matei
    Gosselin, Bernard
    Dutoit, Thierry
    Laganiere, Robert
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1521 - 1525
  • [38] Video attention prediction using gaze saliency
    Chen, Yanxiang
    Tao, Gang
    Xie, Qiangqiang
    Song, Minglong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (19) : 26867 - 26884
  • [39] HMSFU: A hierarchical multi-scale fusion unit for video prediction and beyond
    Zhu, Hongchang
    Fang, Faming
    IET COMPUTER VISION, 2025, 19 (01)
  • [40] Video Saliency Estimation via Encoding Deep Spatiotemporal Saliency Cues
    Jun Wang
    Chang Tian
    Lei Hu
    Wang Hai
    Zeng Mingyong
    Qing Shen
    2018 10TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2018,