Spatial-frequency attention-based optical and scene flow with cross-modal knowledge distillation

被引:1
|
作者
Zhou, Youjie [1 ]
Jiao, Runyu [1 ]
Tao, Zhonghan [1 ]
Liang, Xichang [1 ]
Wan, Yi [1 ]
机构
[1] Shandong Univ, Sch Mech Engn, Jinan, Peoples R China
来源
关键词
Optical and scene flow; Multimodal fusion; Spatial-frequency domain transform; Sttention; Knowledge distillation;
D O I
10.1007/s00371-024-03654-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper studies the problem of multimodal fusion for optical and scene flow from RGB and depth images, or point clouds. Previous methods fuse multimodal information in "early-fusion" or "late-fusion" strategies, in which an attention mechanism is employed to address the problem of optical and scene flow estimation when RGB information is unreliable. Such attentive approaches either suffer from substantial computational and time complexities or lose the inherent characteristics of features due to downsampling. To address this issue, we propose a novel multimodal fusion approach named SFRAFT, which utilizes Fourier transform to build the spatial-frequency domain transformed self-attention and cross-attention. With the novel attentive mechanism, our approach can extract informative features more efficiently and effectively. We further enhance information exchange between the two modalities by incorporating multi-scale knowledge distillation. Experimental results on Flythings3D and KITTI show that our SFRAFT achieves the best performance with low computational and time complexity. We also prove the strong ability of our approach for flow estimation on our real-world dataset. We release the code and datasets at https://doi.org/10.5281/zenodo.12697968.
引用
收藏
页码:4183 / 4198
页数:16
相关论文
共 50 条
  • [21] Unsupervised domain adaptation for lip reading based on cross-modal knowledge distillation
    Yuki Takashima
    Ryoichi Takashima
    Ryota Tsunoda
    Ryo Aihara
    Tetsuya Takiguchi
    Yasuo Ariki
    Nobuaki Motoyama
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [22] CKDH: CLIP-Based Knowledge Distillation Hashing for Cross-Modal Retrieval
    Li, Jiaxing
    Wong, Wai Keung
    Jiang, Lin
    Fang, Xiaozhao
    Xie, Shengli
    Xu, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6530 - 6541
  • [23] A novel attention-based cross-modal transfer learning framework for predicting cardiovascular disease
    Prakash, V. Jothi
    Vijay, S. Arul Antran
    Kumar, P. Ganesh
    Karthikeyan, N. K.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 170
  • [24] Multihead Attention-based Audio Image Generation with Cross-Modal Shared Weight Classifier
    Xu, Yiming
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [25] The effect of gender stereotypes on cross-modal spatial attention
    Gu, Jiyou
    Dong, Huiqin
    SOCIAL BEHAVIOR AND PERSONALITY, 2021, 49 (09):
  • [26] Spatial-frequency feature fusion based deepfake detection through knowledge distillation
    Wang, Bo
    Wu, Xiaohan
    Wang, Fei
    Zhang, Yushu
    Wei, Fei
    Song, Zengren
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [27] Semi-Supervised Knowledge Distillation for Cross-Modal Hashing
    Su, Mingyue
    Gu, Guanghua
    Ren, Xianlong
    Fu, Hao
    Zhao, Yao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 662 - 675
  • [28] Progressive Cross-modal Knowledge Distillation for Human Action Recognition
    Ni, Jianyuan
    Ngu, Anne H. H.
    Yan, Yan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5903 - 5912
  • [29] Cross-modal knowledge distillation for continuous sign language recognition
    Gao, Liqing
    Shi, Peng
    Hu, Lianyu
    Feng, Jichao
    Zhu, Lei
    Wan, Liang
    Feng, Wei
    NEURAL NETWORKS, 2024, 179
  • [30] DistilVPR: Cross-Modal Knowledge Distillation for Visual Place Recognition
    Wang, Sijie
    She, Rui
    Kang, Qiyu
    Jian, Xingchao
    Zhao, Kai
    Song, Yang
    Tay, Wee Peng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9, 2024, : 10377 - 10385