CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引:1
|
作者
Liu, Peng [1 ]
Ge, Yanqi [2 ]
Duan, Lixin [1 ,3 ]
Li, Wen [2 ]
Lv, Fengmao [4 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China
[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China
[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;
D O I
10.1109/TII.2024.3412006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.
引用
收藏
页码:11666 / 11675
页数:10
相关论文
共 50 条
  • [1] Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation
    Chen, Yiyang
    Zhao, Shanshan
    Ding, Changxing
    Tang, Liyao
    Wang, Chaoyue
    Tao, Dacheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3866 - 3875
  • [2] Cross-domain Cross-modal Food Transfer
    Zhu, Bin
    Ngo, Chong-Wah
    Chen, Jing-jing
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3762 - 3770
  • [3] Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation
    Li, Miaoyu
    Zhang, Yachao
    Xie, Yuan
    Gao, Zuodong
    Li, Cuihua
    Zhang, Zhizhong
    Qu, Yanyun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3829 - 3837
  • [4] Prompt Learning with Cross-Modal Feature Alignment for Visual Domain Adaptation
    Liu, Jinxing
    Xiao, Junjin
    Ma, Haokai
    Li, Xiangxian
    Qi, Zhuang
    Meng, Xiangxu
    Meng, Lei
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 416 - 428
  • [5] Cross-Domain Semantic Segmentation of Urban Scenes via Multi-Level Feature Alignment
    Zhang, Bin
    Zhao, Shengjie
    Zhang, Rongqing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1912 - 1917
  • [6] Cross-Domain Transfer Hashing for Efficient Cross-Modal Retrieval
    Li, Fengling
    Wang, Bowen
    Zhu, Lei
    Li, Jingjing
    Zhang, Zheng
    Chang, Xiaojun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9664 - 9677
  • [7] Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment
    Huang, Po-Yao
    Kang, Guoliang
    Liu, Wenhe
    Chang, Xiaojun
    Hauptmann, Alexander G.
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1758 - 1767
  • [8] Cross-Modal and Cross-Domain Knowledge Transfer for Label-Free 3D Segmentation
    Zhang, Jingyu
    Yang, Huitong
    Wu, Dai-Jie
    Keung, Jacky
    Li, Xuesong
    Zhu, Xinge
    Ma, Yuexin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 465 - 477
  • [9] Cross-Modal Cross-Domain Dual Alignment Network for RGB-Infrared Person Re-Identification
    Fu, Xiaowei
    Huang, Fuxiang
    Zhou, Yuhang
    Ma, Huimin
    Xu, Xin
    Zhang, Lei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 6874 - 6887
  • [10] Cross-Domain Few-Shot Hyperspectral Image Classification With Cross-Modal Alignment and Supervised Contrastive Learning
    Li, Zhaokui
    Zhang, Chenyang
    Wang, Yan
    Li, Wei
    Du, Qian
    Fang, Zhuoqun
    Chen, Yushi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 19