Cross-modal domain generalization semantic segmentation based on fusion features

被引:0
|
作者
Yue, Wanlin [1 ]
Zhou, Zhiheng [1 ]
Cao, Yinglie [2 ]
Liuman [3 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Guangzhou City Univ Technol, Sch Elect & Informat Engn & Commun Engn, Guangzhou 510850, Peoples R China
[3] Wise Secur Technol Guangzhou Co Ltd, Guangzhou 510663, Peoples R China
关键词
Domain generalization; Semantic segmentation; Cross-modal;
D O I
10.1016/j.knosys.2024.112356
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The primary techniques for domain generalization in semantic segmentation revolve around domain randomization and feature whitening. Although less commonly employed, methods based on cross-modality have demonstrated effective outcomes. This paper introduces enhancements to cross-modal feature alignment by redesigning the feature alignment module. This redesign facilitates alignment across different modalities by leveraging fusion features derived from both visual and textual inputs. These fusion features provide a more effective anchor point for alignment, enhancing the transfer of semantic information from textual to visual domains. Furthermore, the decoder plays a crucial role in the model as its ability to categorize features directly impacts the segmentation performance of the entire model. To enhance the decoder's capability, this study employs the fusion features as the input for the decoder, with image labels providing the supervision. Experimental results indicate that our approach significantly enhances the model's generalization capabilities.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Semantic Preservation and Hash Fusion Network for Unsupervised Cross-Modal Retrieval
    Shu, Xinsheng
    Li, Mingyong
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 146 - 161
  • [32] Semantic supervised learning based Cross-Modal Retrieval
    Li, Zhuoyi
    Fu, Hao
    Gu, Guanghua
    PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 207 - 209
  • [33] Domain organisation emerges in cross-modal but not within-modal semantic feature integration
    Smith, Gregory J.
    McNorgan, Chris
    LANGUAGE COGNITION AND NEUROSCIENCE, 2023, 38 (05) : 672 - 692
  • [34] Learning Cross-Modal Contrastive Features for Video Domain Adaptation
    Kim, Donghyun
    Tsai, Yi-Hsuan
    Zhuang, Bingbing
    Yu, Xiang
    Sclaroff, Stan
    Saenko, Kate
    Chandraker, Manmohan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13598 - 13607
  • [35] Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation
    Zhang, Xuming
    Yokoya, Naoto
    Gu, Xingfa
    Tian, Qingjiu
    Bruzzone, Lorenzo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [36] CMPFFNet: Cross-Modal and Progressive Feature Fusion Network for RGB-D Indoor Scene Semantic Segmentation
    Zhou, Wujie
    Xiao, Yuxiang
    Yan, Weiqing
    Yu, Lu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (04) : 5523 - 5533
  • [37] CMPFFNet: Cross-Modal and Progressive Feature Fusion Network for RGB-D Indoor Scene Semantic Segmentation
    Zhou, Wujie
    Xiao, Yuxiang
    Yan, Weiqing
    Yu, Lu
    IEEE Transactions on Automation Science and Engineering, 2023, : 1 - 11
  • [38] Cross-Modal Learning for Event-Based Semantic Segmentation via Attention Soft Alignment
    Xie, Chuyun
    Gao, Wei
    Guo, Ren
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03): : 2359 - 2366
  • [39] Augmentation-based Domain Generalization for Semantic Segmentation
    Schwonberg, Manuel
    El Bouazati, Fadoua
    Schmidt, Nico M.
    Gottschalk, Hanno
    2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV, 2023,
  • [40] Cross-modal generalization of value-based attentional priority
    Laurent Grégoire
    Lana Mrkonja
    Brian A. Anderson
    Attention, Perception, & Psychophysics, 2022, 84 : 2423 - 2431