Spatial-frequency attention-based optical and scene flow with cross-modal knowledge distillation

被引:1
|
作者
Zhou, Youjie [1 ]
Jiao, Runyu [1 ]
Tao, Zhonghan [1 ]
Liang, Xichang [1 ]
Wan, Yi [1 ]
机构
[1] Shandong Univ, Sch Mech Engn, Jinan, Peoples R China
来源
关键词
Optical and scene flow; Multimodal fusion; Spatial-frequency domain transform; Sttention; Knowledge distillation;
D O I
10.1007/s00371-024-03654-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper studies the problem of multimodal fusion for optical and scene flow from RGB and depth images, or point clouds. Previous methods fuse multimodal information in "early-fusion" or "late-fusion" strategies, in which an attention mechanism is employed to address the problem of optical and scene flow estimation when RGB information is unreliable. Such attentive approaches either suffer from substantial computational and time complexities or lose the inherent characteristics of features due to downsampling. To address this issue, we propose a novel multimodal fusion approach named SFRAFT, which utilizes Fourier transform to build the spatial-frequency domain transformed self-attention and cross-attention. With the novel attentive mechanism, our approach can extract informative features more efficiently and effectively. We further enhance information exchange between the two modalities by incorporating multi-scale knowledge distillation. Experimental results on Flythings3D and KITTI show that our SFRAFT achieves the best performance with low computational and time complexity. We also prove the strong ability of our approach for flow estimation on our real-world dataset. We release the code and datasets at https://doi.org/10.5281/zenodo.12697968.
引用
收藏
页码:4183 / 4198
页数:16
相关论文
共 50 条
  • [41] Learnable Cross-modal Knowledge Distillation for Multi-modal Learning with Missing Modality
    Wang, Hu
    Ma, Congbo
    Zhang, Jianpeng
    Zhang, Yuan
    Avery, Jodie
    Hull, Louise
    Carneiro, Gustavo
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IV, 2023, 14223 : 216 - 226
  • [42] Cross-Modal Knowledge Distillation in Deep Networks for SAR Image Classification
    Jahan, Chowdhury Sadman
    Savakis, Andreas
    Blasch, Erik
    GEOSPATIAL INFORMATICS XII, 2022, 12099
  • [43] Cross-Modal Knowledge Distillation for Depth Privileged Monocular Visual Odometry
    Li, Bin
    Wang, Shuling
    Ye, Haifeng
    Gong, Xiaojin
    Xiang, Zhiyu
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 6171 - 6178
  • [44] Cross-Modal Knowledge Distillation Method for Automatic Cued Speech Recognition
    Wang, Jianrong
    Tang, Ziyue
    Li, Xuewei
    Yu, Mei
    Fang, Qiang
    Liu, Li
    INTERSPEECH 2021, 2021, : 2986 - 2990
  • [45] CROSS-MODAL KNOWLEDGE DISTILLATION FOR VISION-TO-SENSOR ACTION RECOGNITION
    Ni, Jianyuan
    Sarbajna, Raunak
    Liu, Yang
    Ngu, Anne H. H.
    Yan, Yan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4448 - 4452
  • [46] FedCMD: A Federated Cross-modal Knowledge Distillation for Drivers' Emotion Recognition
    Bano, Saira
    Tonellotto, Nicola
    Cassara, Pietro
    Gotta, Alberto
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
  • [47] Neural substrates of perceptual enhancement by cross-modal spatial attention
    McDonald, JJ
    Teder-Sälejärvi, WA
    Di Russo, F
    Hillyard, SA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 10 - 19
  • [48] ATTENTION-BASED SPATIAL-FREQUENCY INFORMATION NETWORK FOR UNDERWATER SINGLE IMAGE SUPER-RESOLUTION
    Pramanick, Alik
    Megha, Dhruvil
    Sur, Arijit
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3560 - 3564
  • [49] Multimode Fiber Image Transmission via Cross-Modal Knowledge distillation
    Lin, Weixuan
    Wu, Di
    Boulet, Benoit
    2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 13 - 19
  • [50] CEKD: Cross-Modal Edge-Privileged Knowledge Distillation for Semantic Scene Understanding Using Only Thermal Images
    Feng, Zhen
    Guo, Yanning
    Sun, Yuxiang
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 2205 - 2212