Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection

被引：261

作者：

Chen, Hao ^{[1
]}

Li, Youfu ^{[1
]}

Su, Dan ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Mech Engn, 83 Tat Chee Ave, Kowloon Tong, Hong Kong, Peoples R China

来源：

PATTERN RECOGNITION | 2019年 / 86卷

关键词：

RGB-D; Convolutional neural networks; Multi-path; Saliency detection; DETECTION MODEL; VIDEO;

D O I：

10.1016/j.patcog.2018.08.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Paired RGB and depth images are becoming popular multi-modal data adopted in computer vision tasks. Traditional methods based on Convolutional Neural Networks (CNNs) typically fuse RGB and depth by combining their deep representations in a late stage with only one path, which can be ambiguous and insufficient for fusing large amounts of cross-modal data. To address this issue, we propose a novel multi-scale multi-path fusion network with cross-modal interactions (MMCI), in which the traditional two-stream fusion architecture with single fusion path is advanced by diversifying the fusion path to a global reasoning one and another local capturing one and meanwhile introducing cross-modal interactions in multiple layers. Compared to traditional two-stream architectures, the MMCI net is able to supply more adaptive and flexible fusion flows, thus easing the optimization and enabling sufficient and efficient fusion. Concurrently, the MMCI net is equipped with multi-scale perception ability (i.e., simultaneously global and local contextual reasoning). We take RGB-D saliency detection as an example task. Extensive experiments on three benchmark datasets show the improvement of the proposed MMCI net over other state-of-the-art methods. (C) 2018 Elsevier Ltd. All rights reserved.

引用

页码：376 / 385

页数：10

共 50 条

[41] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
Chen, Hao
Li, You-Fu
Su, Dan
2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826
[42] Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond
Chen, Hao
Shen, Feihong
Ding, Ding
Deng, Yongjian
Li, Chao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1699 - 1709
[43] Joint Cross-Modal and Unimodal Features for RGB-D Salient Object Detection
Huang, Nianchang
Liu, Yi
Zhang, Qiang
Han, Jungong
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2428 - 2441
[44] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
Xiao, Yun
Huang, Yameng
Li, Chenglong
Liu, Lei
Zhou, Aiwu
Tang, Jin
COGNITIVE COMPUTATION, 2023, 15 (06) : 1868 - 1883
[45] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
Yun Xiao
Yameng Huang
Chenglong Li
Lei Liu
Aiwu Zhou
Jin Tang
Cognitive Computation, 2023, 15 : 1868 - 1883
[46] Cross-modal refined adjacent-guided network for RGB-D salient object detection
Bi H.
Zhang J.
Wu R.
Tong Y.
Jin W.
Multimedia Tools Appl, 24 (37453-37478): : 37453 - 37478
[47] SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection
Peng, Yanbin
Zhai, Zhinian
Feng, Mingkun
SENSORS, 2024, 24 (04)
[48] Feature interaction and two-stage cross-modal fusion for RGB-D salient object detection
Yu, Ming
Liu, Jiali
Liu, Yi
Yan, Gang
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 4543 - 4556
[49] RGB-D Salient Object Detection Based on Cross-Modal Fusion and Boundary Deformable Convolution Guidance
Meng L.-B.
Yuan M.-Y.
Shi X.-H.
Zhang L.
Wu J.-H.
Cheng F.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (11): : 3155 - 3166
[50] Multi-modal deep network for RGB-D segmentation of clothes
Joukovsky, B.
Hu, P.
Munteanu, A.
ELECTRONICS LETTERS, 2020, 56 (09) : 432 - 434

← 1 2 3 4 5 →