Spatial-frequency attention-based optical and scene flow with cross-modal knowledge distillation

被引：1

作者：

Zhou, Youjie ^{[1
]}

Jiao, Runyu ^{[1
]}

Tao, Zhonghan ^{[1
]}

Liang, Xichang ^{[1
]}

Wan, Yi ^{[1
]}

机构：

[1] Shandong Univ, Sch Mech Engn, Jinan, Peoples R China

来源：

VISUAL COMPUTER | 2024年

关键词：

Optical and scene flow; Multimodal fusion; Spatial-frequency domain transform; Sttention; Knowledge distillation;

D O I：

10.1007/s00371-024-03654-2

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper studies the problem of multimodal fusion for optical and scene flow from RGB and depth images, or point clouds. Previous methods fuse multimodal information in "early-fusion" or "late-fusion" strategies, in which an attention mechanism is employed to address the problem of optical and scene flow estimation when RGB information is unreliable. Such attentive approaches either suffer from substantial computational and time complexities or lose the inherent characteristics of features due to downsampling. To address this issue, we propose a novel multimodal fusion approach named SFRAFT, which utilizes Fourier transform to build the spatial-frequency domain transformed self-attention and cross-attention. With the novel attentive mechanism, our approach can extract informative features more efficiently and effectively. We further enhance information exchange between the two modalities by incorporating multi-scale knowledge distillation. Experimental results on Flythings3D and KITTI show that our SFRAFT achieves the best performance with low computational and time complexity. We also prove the strong ability of our approach for flow estimation on our real-world dataset. We release the code and datasets at https://doi.org/10.5281/zenodo.12697968.

引用

页码：4183 / 4198

页数：16

共 50 条

[31] CROSS-MODAL KNOWLEDGE DISTILLATION IN MULTI-MODAL FAKE NEWS DETECTION
Wei, Zimian
Pan, Hengyue
Qiao, Linbo
Niu, Xin
Dong, Peijie
Li, Dongsheng
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4733 - 4737
[32] ERP effects of intermodal attention and cross-modal links in spatial attention
Eimer, M
Schröger, E
PSYCHOPHYSIOLOGY, 1998, 35 (03) : 313 - 327
[33] Cross-Modal Self-Attention Distillation for Prostate Cancer Segmentation
Zhang, Guokai
Shen, Xiaoang
Luo, Ye
Luo, Jihao
Wang, Zeju
Wang, Weigang
Zhao, Binghui
Lu, Jianwei
2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 909 - 914
[34] Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
Shixing Han
Jin Liu
Jinyingming Zhang
Peizhu Gong
Xiliang Zhang
Huihua He
Complex & Intelligent Systems, 2023, 9 : 4995 - 5012
[35] Lightweight dense video captioning with cross-modal attention and knowledge-enhanced unbiased scene graph
Han, Shixing
Liu, Jin
Zhang, Jinyingming
Gong, Peizhu
Zhang, Xiliang
He, Huihua
COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (05) : 4995 - 5012
[36] Automatic depression prediction via cross-modal attention-based multi-modal fusion in social networks
Wang, Lidong
Zhang, Yin
Zhou, Bin
Cao, Shihua
Hu, Keyong
Tan, Yunfei
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
[37] EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals
Liu, Yucheng
Jia, Ziyu
Wang, Haichao
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6122 - 6131
[38] Attention-Based Cross-Modal CNN Using Non-Disassembled Files for Malware Classification
Kim, Jeongwoo
Paik, Joon-Young
Cho, Eun-Sun
IEEE ACCESS, 2023, 11 : 22889 - 22903
[39] ACMFNet: Attention-Based Cross-Modal Fusion Network for Building Extraction of Remote Sensing Images
Chen, Baiyu
Pan, Zongxu
Yang, Jianwei
Long, Hui
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[40] Multilevel Attention-Based Sample Correlations for Knowledge Distillation
Gou, Jianping
Sun, Liyuan
Yu, Baosheng
Wan, Shaohua
Ou, Weihua
Yi, Zhang
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (05) : 7099 - 7109

← 1 2 3 4 5 →