CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引:1
|
作者
Liu, Peng [1 ]
Ge, Yanqi [2 ]
Duan, Lixin [1 ,3 ]
Li, Wen [2 ]
Lv, Fengmao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China
[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;
D O I
10.1109/TII.2024.3412006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.
引用
收藏
页码:11666 / 11675
页数:10
相关论文
共 50 条
  • [21] ERP evidence for temporal differences between cross-modal and cross-domain analogical reasoning
    Zhao, Yanqun
    Guo, Jiajia
    Li, Yangzhuo
    Wu, Yuedong
    Luo, Junlong
    BEHAVIOURAL BRAIN RESEARCH, 2024, 470
  • [22] Cross-Domain Scene Unsupervised Learning Segmentation With Dynamic Subdomains
    He, Pei
    Jiao, Licheng
    Liu, Fang
    Liu, Xu
    Shang, Ronghua
    Wang, Shuang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6770 - 6784
  • [23] PMDA: Domain Alignment with Prototype Matching for Cross-Domain Adaptive Segmentation
    Li, Weiwei
    Ren, Yuanyuan
    Liu, Junzhuo
    Wang, Chenyang
    Zheng, Yuchen
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2339 - 2344
  • [24] CONTRAST UNCERTAINTY DOMAIN ALIGNMENT FOR CROSS-DOMAIN PANCREATIC IMAGE SEGMENTATION
    Fan, Ligang
    Bian, Yun
    Zhu, Weifang
    Shi, Fei
    Chen, Xinjian
    Shao, Chengwei
    Xiang, Dehui
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [25] Joint alignment of the distribution in input and feature space for cross-domain aerial image semantic segmentation
    Chen, Zhe
    Yang, Bisheng
    Ma, Ailong
    Peng, Mingjun
    Li, Haiting
    Chen, Tao
    Chen, Chi
    Dong, Zhen
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 115
  • [26] Cross-Domain Rumor Detection based on Dual-Modal Domain Alignment
    Liu, Danni
    Liu, Bo
    Chen, Yida
    Wu, Wanmeng
    Cao, Jiuxin
    Hou, Yiwen
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 544 - 548
  • [27] Area-keywords cross-modal alignment for referring image segmentation
    Zhang, Huiyong
    Wang, Lichun
    Li, Shuang
    Xu, Kai
    Yin, Baocai
    NEUROCOMPUTING, 2024, 581
  • [28] Heterogeneous Feature Fusion and Cross-modal Alignment for Composed Image Retrieval
    Zhang, Gangjian
    Wei, Shikui
    Pang, Huaxin
    Zhao, Yao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5353 - 5362
  • [29] SAFENet: Semantic-Aware Feature Enhancement Network for unsupervised cross-domain road scene segmentation
    Ren, Dexin
    Li, Minxian
    Wang, Shidong
    Ren, Mingwu
    Zhang, Haofeng
    IMAGE AND VISION COMPUTING, 2024, 152
  • [30] CMPFFNet: Cross-Modal and Progressive Feature Fusion Network for RGB-D Indoor Scene Semantic Segmentation
    Zhou, Wujie
    Xiao, Yuxiang
    Yan, Weiqing
    Yu, Lu
    IEEE Transactions on Automation Science and Engineering, 2023, : 1 - 11