CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引:1
|
作者
Liu, Peng [1 ]
Ge, Yanqi [2 ]
Duan, Lixin [1 ,3 ]
Li, Wen [2 ]
Lv, Fengmao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China
[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;
D O I
10.1109/TII.2024.3412006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.
引用
收藏
页码:11666 / 11675
页数:10
相关论文
共 50 条
  • [31] CMPFFNet: Cross-Modal and Progressive Feature Fusion Network for RGB-D Indoor Scene Semantic Segmentation
    Zhou, Wujie
    Xiao, Yuxiang
    Yan, Weiqing
    Yu, Lu
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (04) : 5523 - 5533
  • [32] Instance Segmentation with Cross-Modal Consistency
    Zhu, Alex Zihao
    Casser, Vincent
    Mahjourian, Reza
    Kretzschmar, Henrik
    Pirk, Soren
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 2009 - 2016
  • [33] Feature selection for cross-scene hyperspectral image classification using cross-domain ReliefF
    Ye, Minchao
    Xu, Yongqiu
    Ji, Chenxi
    Chen, Hong
    Lu, Huijuan
    Qian, Yuntao
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2019, 17 (05)
  • [34] Cross-modal domain generalization semantic segmentation based on fusion features
    Yue, Wanlin
    Zhou, Zhiheng
    Cao, Yinglie
    Liuman
    KNOWLEDGE-BASED SYSTEMS, 2024, 302
  • [35] Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment
    Zhang, Ji
    Wu, Xiao
    Cheng, Zhi-Qi
    He, Qi
    Li, Wei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8515 - 8524
  • [36] CROSS-SCENE FEATURE SELECTION FOR HYPERSPECTRAL IMAGES BASED ON CROSS-DOMAIN INFORMATION GAIN
    Ye, Minchao
    Xu, Yongqiu
    Lu, Huijuan
    Yan, Ke
    Qian, Yuntao
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 4764 - 4767
  • [37] Semisupervised Cross-Domain Remote Sensing Scene Classification via Category-Level Feature Alignment Network
    Li, Yang
    Li, Zhang
    Su, Ang
    Wang, Kun
    Wang, Zi
    Yu, Qifeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
  • [38] DIVERGENCE-GUIDED FEATURE ALIGNMENT FOR CROSS-DOMAIN OBJECT DETECTION
    Li, Zongyao
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2240 - 2244
  • [39] AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection
    Wang, Hongsong
    Liao, Shengcai
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4046 - 4056
  • [40] Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation
    Vobecky, Antonin
    Hurych, David
    Simeoni, Oriane
    Gidaris, Spyros
    Bursuc, Andrei
    Perez, Patrick
    Sivic, Josef
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,