CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引:1
|
作者
Liu, Peng [1 ]
Ge, Yanqi [2 ]
Duan, Lixin [1 ,3 ]
Li, Wen [2 ]
Lv, Fengmao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China
[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;
D O I
10.1109/TII.2024.3412006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.
引用
收藏
页码:11666 / 11675
页数:10
相关论文
共 50 条
  • [41] Multi-Modal Pulmonary Mass Segmentation Network Based on Cross-Modal Spatial Alignment
    LI Jiaxin
    CHEN Houjin
    PENG Yahui
    LI Yanfeng
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2022, 44 (01) : 11 - 17
  • [42] Domain organisation emerges in cross-modal but not within-modal semantic feature integration
    Smith, Gregory J.
    McNorgan, Chris
    LANGUAGE COGNITION AND NEUROSCIENCE, 2023, 38 (05) : 672 - 692
  • [43] Token Embeddings Alignment for Cross-Modal Retrieval
    Xie, Chen-Wei
    Wu, Jianmin
    Zheng, Yun
    Pan, Pan
    Hua, Xian-Sheng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4555 - 4563
  • [44] Cross-modal Variational Alignment of Latent Spaces
    Theodoridis, Thomas
    Chatzis, Theocharis
    Solachidis, Vassilios
    Dimitropoulos, Kosmas
    Daras, Petros
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4127 - 4136
  • [45] HCNet: Hierarchical Feature Aggregation and Cross-Modal Feature Alignment for Remote Sensing Image Captioning
    Yang, Zhigang
    Li, Qiang
    Yuan, Yuan
    Wang, Qi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 11
  • [46] Neural entity alignment with cross-modal supervision
    Su, Fenglong
    Xu, Chengjin
    Yang, Han
    Chen, Zhongwu
    Jing, Ning
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [47] Adequate alignment and interaction for cross-modal retrieval
    Mingkang WANG
    Min MENG
    Jigang LIU
    Jigang WU
    虚拟现实与智能硬件(中英文), 2023, 5 (06) : 509 - 522
  • [48] Cross-Modal Translation and Alignment for Survival Analysis
    Zhou, Fengtao
    Chen, Hao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21428 - 21437
  • [49] Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval
    Fang, Xiang
    Liu, Daizong
    Zhou, Pan
    Hu, Yuchong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7517 - 7532
  • [50] Robust cross-modal retrieval with alignment refurbishment
    Guo, Jinyi
    Ding, Jieyu
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (10) : 1403 - 1415