Accurate video saliency prediction via hierarchical fusion and temporal recurrence

被引：2

作者：

Zhang, Yunzuo ^{[1
]}

Zhang, Tian ^{[1
]}

Wu, Cunyu ^{[1
]}

Zheng, Yuxin ^{[1
]}

机构：

[1] Shijiazhuang Tiedao Univ, Sch Informat Sci & Technol, Shijiazhuang 050043, Hebei, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2023年 / 136卷

基金：

中国国家自然科学基金;

关键词：

Video saliency prediction; Hierarchical spatiotemporal feature; Temporal recurrence; 3D convolutional network; Attention mechanism; CONVOLUTIONAL NETWORKS; NEURAL-NETWORK; MODEL; EYE;

D O I：

10.1016/j.imavis.2023.104744

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the ability to extract spatiotemporal features, 3D convolutional networks have become the mainstream method for Video Saliency Prediction (VSP). However, these methods cannot make full use of hierarchical spatio-temporal features and also lack focus on past salient features, which hinders further improvements in accuracy. To address these issues, we propose a 3D convolutional Network based on Hierarchical Fusion and Temporal Re-currence (HFTR-Net) for VSP. Specifically, we propose a Bi-directional Temporal-Spatial Feature Pyramid (BiTSFP), which adds the flow of shallow location information based on the previous flow of deep semantic infor-mation. Then, different from simple addition and concatenation, we design a Hierarchical Adaptive Fusion (HAF) mechanism that can adaptively learn the fusion weights of adjacent features to integrate them appropriately. Moreover, to utilize previous salient information, a Recall 3D convGRU (R3D GRU) module is integrated into the 3D convolution-based method for the first time. It subtly combines the local feature extraction of the 3D back-bone with the long-term relationship modeling of the temporal recurrence mechanism. Experimental results on the three common datasets demonstrate that the HFTR-Net outperforms existing state-of-the-art methods in accuracy.& COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：12

共 50 条

[21] Fusion of Hierarchical Optimization Models for Accurate Power Load Prediction
Wan, Sicheng
Wang, Yibo
Zhang, Youshuang
Zhu, Beibei
Huang, Huakun
Liu, Jia
SUSTAINABILITY, 2024, 16 (16)
[22] Dynamic Saliency Detection via CNN and Spatial-temporal Fusion
Qi, Zhang
Dong, Xu
TENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2018), 2018, 10806
[23] Multi-Scale Spatiotemporal Feature Fusion Network for Video Saliency Prediction
Zhang, Yunzuo
Zhang, Tian
Wu, Cunyu
Tao, Ran
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4183 - 4193
[24] Spatio-Temporal Self-Attention Network for Video Saliency Prediction
Wang, Ziqiang
Liu, Zhi
Li, Gongyang
Wang, Yang
Zhang, Tianhong
Xu, Lihua
Wang, Jijun
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1161 - 1174
[25] TinyHD: Efficient Video Saliency Prediction with Heterogeneous Decoders using Hierarchical Maps Distillation
Hu, Feiyan
Palazzo, Simone
Salanitri, Federica Proietto
Bellitto, Giovanni
Moradi, Morteza
Spampinato, Concetto
McGuinness, Kevin
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2050 - 2059
[26] Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency
Lv, Ying
Zhou, Wujie
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020 (2020)
[27] Video Saliency Prediction Based on Spatial-Temporal Two-Stream Network
Zhang, Kao
Chen, Zhenzhong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (12) : 3544 - 3557
[28] Fixation Analysis for Video Saliency Prediction
Ikenoya R.
Ohashi G.
IEEJ Transactions on Electronics, Information and Systems, 2023, 143 (09) : 885 - 894
[29] Video saliency detection via bagging-based prediction and spatiotemporal propagation
Zhou, Xiaofei
Liu, Zhi
Li, Kai
Sun, Guangling
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2018, 51 : 131 - 143
[30] Deep fusion based video saliency detection
Wen, Hongfa
Zhou, Xiaofei
Sun, Yaoqi
Zhang, Jiyong
Yan, Chenggang
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 62 : 279 - 285

← 1 2 3 4 5 →