Accurate video saliency prediction via hierarchical fusion and temporal recurrence

被引：2

作者：

Zhang, Yunzuo ^{[1
]}

Zhang, Tian ^{[1
]}

Wu, Cunyu ^{[1
]}

Zheng, Yuxin ^{[1
]}

机构：

[1] Shijiazhuang Tiedao Univ, Sch Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 136卷

基金：

中国国家自然科学基金;

关键词：

Video saliency prediction; Hierarchical spatiotemporal feature; Temporal recurrence; 3D convolutional network; Attention mechanism; CONVOLUTIONAL NETWORKS; NEURAL-NETWORK; MODEL; EYE;

D O I：

10.1016/j.imavis.2023.104744

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the ability to extract spatiotemporal features, 3D convolutional networks have become the mainstream method for Video Saliency Prediction (VSP). However, these methods cannot make full use of hierarchical spatio-temporal features and also lack focus on past salient features, which hinders further improvements in accuracy. To address these issues, we propose a 3D convolutional Network based on Hierarchical Fusion and Temporal Re-currence (HFTR-Net) for VSP. Specifically, we propose a Bi-directional Temporal-Spatial Feature Pyramid (BiTSFP), which adds the flow of shallow location information based on the previous flow of deep semantic infor-mation. Then, different from simple addition and concatenation, we design a Hierarchical Adaptive Fusion (HAF) mechanism that can adaptively learn the fusion weights of adjacent features to integrate them appropriately. Moreover, to utilize previous salient information, a Recall 3D convGRU (R3D GRU) module is integrated into the 3D convolution-based method for the first time. It subtly combines the local feature extraction of the 3D back-bone with the long-term relationship modeling of the temporal recurrence mechanism. Experimental results on the three common datasets demonstrate that the HFTR-Net outperforms existing state-of-the-art methods in accuracy.& COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：12

共 50 条

[1] Video saliency prediction via spatio-temporal reasoning
Chen, Jiazhong
Li, Zongyi
Jin, Yi
Ren, Dakai
Ling, Hefei
NEUROCOMPUTING, 2021, 462 : 59 - 68
[2] Superpixel-based video saliency detection via the fusion of spatiotemporal saliency and temporal coherency
Li, Yandi
Xu, Xiping
Zhang, Ning
Du, Enyu
OPTICAL ENGINEERING, 2019, 58 (08)
[3] The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction
Lazaridis, Nikos
Georgiadis, Kostas
Kalaganis, Fotis
Kordopatis-Zilos, Giorgos
Papadopoulos, Symeon
Nikolopoulos, Spiros
Kompatsiaris, Ioannis
IEEE ACCESS, 2024, 12 : 129705 - 129716
[4] Accurate Object Segmentation for Video Sequences via Temporal-Spatial-Frequency Saliency Model
Xu, Bing
Niu, Yanxiong
IEEE INTELLIGENT SYSTEMS, 2018, 33 (01) : 18 - 28
[5] Contrast Based Hierarchical Spatial-Temporal Saliency for Video
Le, Trung-Nghia
Sugimoto, Akihiro
IMAGE AND VIDEO TECHNOLOGY, PSIVT 2015, 2016, 9431 : 734 - 748
[6] GFNet: gated fusion network for video saliency prediction
Wu, Songhe
Zhou, Xiaofei
Sun, Yaoqi
Gao, Yuhan
Zhu, Zunjie
Zhang, Jiyong
Yan, Chenggang
APPLIED INTELLIGENCE, 2023, 53 (22) : 27865 - 27875
[7] GFNet: gated fusion network for video saliency prediction
Songhe Wu
Xiaofei Zhou
Yaoqi Sun
Yuhan Gao
Zunjie Zhu
Jiyong Zhang
Chenggang Yan
Applied Intelligence, 2023, 53 : 27865 - 27875
[8] VIDEO RETARGETING WITH NONLINEAR SPATIAL-TEMPORAL SALIENCY FUSION
Lu, Taoran
Yuan, Zheng
Huang, Yu
Wu, Dapeng
Yu, Heather
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 1801 - 1804
[9] Hierarchical spatiotemporal Feature Interaction Network for video saliency prediction
Jin, Yingjie
Zhou, Xiaofei
Zhang, Zhenjie
Fang, Hao
Shi, Ran
Xu, Xiaobin
IMAGE AND VISION COMPUTING, 2025, 154
[10] Learning Coupled Convolutional Networks Fusion for Video Saliency Prediction
Wu, Zhe
Su, Li
Huang, Qingming
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (10) : 2960 - 2971

← 1 2 3 4 5 →