CAFA: Cross-Modal Attentive Feature Alignment for Cross-Domain Urban Scene Segmentation

被引：1

作者：

Liu, Peng ^{[1
]}

Ge, Yanqi ^{[2
]}

Duan, Lixin ^{[1
,3
]}

Li, Wen ^{[2
]}

Lv, Fengmao ^{[4
,5
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[2] Univ Elect Sci & Technol China, Shenzhen Inst Adv Study, Shenzhen 518110, Peoples R China

[3] Univ Elect Sci & Technol China, Sichuan Prov Peoples Hosp, Chengdu 610032, Peoples R China

[4] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China

[5] Minist Educ, Engn Res Ctr Sustainable Urban Intelligent Transp, Chengdu 611756, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2024年 / 20卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Semantic segmentation; Feature extraction; Training; Transformers; Estimation; Adaptation models; Autonomous vehicles; domain adaptation; semantic segmentation;

D O I：

10.1109/TII.2024.3412006

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Autonomous driving systems rely heavily on semantic segmentation models for accurate and safe decision-making. High segmentation performance in real-world urban scenes is crucial for autonomous vehicles, while substantial pixel-level labels are required during model training. Unsupervised domain adaptation (UDA) techniques are widely used to adapt the segmentation model trained on the synthetic data (i.e., source domain) to the real-world data (i.e., target domain) since obtaining pixel-level annotations is fairly easy in the synthetic environment. Recently, increasing UDA approaches promote cross-domain semantic segmentation (CDSS) by fusing the depth information into the RGB features. However, feature fusion does not necessarily eliminate the domain-specific components in the RGB features, which can result in the features still being influenced by domain-specific information. To address this, we propose a novel cross-modal attentive feature alignment (CAFA) framework for CDSS, which provides an explicit perspective of using depth information to align the main backbone RGB features of both domains in a nonadversarial manner. In particular, considering that the depth modality is less affected by the domain gap, we employ depth as an intermediate modality and align the RGB features by attending RGB features to the depth modality through constructing an auxiliary multimodal segmentation task. The state-of-the-art performance of our CAFA can be achieved on benchmark tasks, such as Synthia -> Cityscapes and grand theft auto (GTA) -> Cityscapes.

引用

页码：11666 / 11675

页数：10

共 50 条

[31] CMPFFNet: Cross-Modal and Progressive Feature Fusion Network for RGB-D Indoor Scene Semantic Segmentation
Zhou, Wujie
Xiao, Yuxiang
Yan, Weiqing
Yu, Lu
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (04) : 5523 - 5533
[32] Instance Segmentation with Cross-Modal Consistency
Zhu, Alex Zihao
Casser, Vincent
Mahjourian, Reza
Kretzschmar, Henrik
Pirk, Soren
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 2009 - 2016
[33] Feature selection for cross-scene hyperspectral image classification using cross-domain ReliefF
Ye, Minchao
Xu, Yongqiu
Ji, Chenxi
Chen, Hong
Lu, Huijuan
Qian, Yuntao
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2019, 17 (05)
[34] Cross-modal domain generalization semantic segmentation based on fusion features
Yue, Wanlin
Zhou, Zhiheng
Cao, Yinglie
Liuman
KNOWLEDGE-BASED SYSTEMS, 2024, 302
[35] Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment
Zhang, Ji
Wu, Xiao
Cheng, Zhi-Qi
He, Qi
Li, Wei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8515 - 8524
[36] CROSS-SCENE FEATURE SELECTION FOR HYPERSPECTRAL IMAGES BASED ON CROSS-DOMAIN INFORMATION GAIN
Ye, Minchao
Xu, Yongqiu
Lu, Huijuan
Yan, Ke
Qian, Yuntao
IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 4764 - 4767
[37] Semisupervised Cross-Domain Remote Sensing Scene Classification via Category-Level Feature Alignment Network
Li, Yang
Li, Zhang
Su, Ang
Wang, Kun
Wang, Zi
Yu, Qifeng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 14
[38] DIVERGENCE-GUIDED FEATURE ALIGNMENT FOR CROSS-DOMAIN OBJECT DETECTION
Li, Zongyao
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2240 - 2244
[39] AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection
Wang, Hongsong
Liao, Shengcai
Shao, Ling
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4046 - 4056
[40] Unsupervised Semantic Segmentation of Urban Scenes via Cross-Modal Distillation
Vobecky, Antonin
Hurych, David
Simeoni, Oriane
Gidaris, Spyros
Bursuc, Andrei
Perez, Patrick
Sivic, Josef
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,

← 1 2 3 4 5 →