Spatial-frequency attention-based optical and scene flow with cross-modal knowledge distillation

被引:1
|
作者
Zhou, Youjie [1 ]
Jiao, Runyu [1 ]
Tao, Zhonghan [1 ]
Liang, Xichang [1 ]
Wan, Yi [1 ]
机构
[1] Shandong Univ, Sch Mech Engn, Jinan, Peoples R China
来源
关键词
Optical and scene flow; Multimodal fusion; Spatial-frequency domain transform; Sttention; Knowledge distillation;
D O I
10.1007/s00371-024-03654-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper studies the problem of multimodal fusion for optical and scene flow from RGB and depth images, or point clouds. Previous methods fuse multimodal information in "early-fusion" or "late-fusion" strategies, in which an attention mechanism is employed to address the problem of optical and scene flow estimation when RGB information is unreliable. Such attentive approaches either suffer from substantial computational and time complexities or lose the inherent characteristics of features due to downsampling. To address this issue, we propose a novel multimodal fusion approach named SFRAFT, which utilizes Fourier transform to build the spatial-frequency domain transformed self-attention and cross-attention. With the novel attentive mechanism, our approach can extract informative features more efficiently and effectively. We further enhance information exchange between the two modalities by incorporating multi-scale knowledge distillation. Experimental results on Flythings3D and KITTI show that our SFRAFT achieves the best performance with low computational and time complexity. We also prove the strong ability of our approach for flow estimation on our real-world dataset. We release the code and datasets at https://doi.org/10.5281/zenodo.12697968.
引用
收藏
页码:4183 / 4198
页数:16
相关论文
共 50 条
  • [1] Multispectral Scene Classification via Cross-Modal Knowledge Distillation
    Liu, Hao
    Qu, Ying
    Zhang, Liqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [2] Text-assisted attention-based cross-modal hashing
    Yuan, Xiang
    Shan, Shihao
    Huo, Yuwen
    Jiang, Junkai
    Wu, Song
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (01)
  • [3] Text-assisted attention-based cross-modal hashing
    Xiang Yuan
    Shihao Shan
    Yuwen Huo
    Junkai Jiang
    Song Wu
    International Journal of Multimedia Information Retrieval, 2024, 13
  • [4] Cross-Modal Knowledge Distillation with Dropout-Based Confidence
    Cho, Won Ik
    Kim, Jeunghun
    Kim, Nam Soo
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 653 - 657
  • [5] Cross-modal links in spatial attention
    Driver, J
    Spence, C
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 1998, 353 (1373) : 1319 - 1331
  • [6] Cross-modal synergies in spatial attention
    Driver, J
    Eimer, M
    Macaluso, E
    Van Velzen, J
    PERCEPTION, 2003, 32 : 15 - 15
  • [7] CROSS-MODAL KNOWLEDGE DISTILLATION FOR ACTION RECOGNITION
    Thoker, Fida Mohammad
    Gall, Juergen
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 6 - 10
  • [8] ARIF: An Adaptive Attention-Based Cross-Modal Representation Integration Framework
    Liu, Chengzhi
    Luo, Zihong
    Bi, Yifei
    Huang, Zile
    Shu, Dong
    Hou, Jiheng
    Wang, Hongchen
    Liang, Kaiyu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 3 - 18
  • [9] IMPLICIT ATTENTION-BASED CROSS-MODAL COLLABORATIVE LEARNING FOR ACTION RECOGNITION
    Zhang, Jianghao
    Zhong, Xian
    Liu, Wenxuan
    Jiang, Kui
    Yang, Zhengwei
    Wang, Zheng
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3020 - 3024
  • [10] Scattering and Optical Cross-Modal Attention Distillation Framework for SAR Target Recognition
    Wang, Longfei
    Liu, Zhunga
    Zhang, Zuowei
    IEEE SENSORS JOURNAL, 2025, 25 (02) : 3126 - 3137