Physically-guided open vocabulary segmentation with weighted patched alignment loss

被引：1

作者：

Liu, Weide ^{[1
]}

Lou, Jieming ^{[2
]}

Wang, Xingxing ^{[3
]}

Zhou, Wei ^{[4
]}

Cheng, Jun ^{[3
]}

Yang, Xulei ^{[3
]}

机构：

[1] Harvard Med Sch, Boston, MA USA

[2] Natl Univ Singapore, Singapore, Singapore

[3] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore

[4] Univ Wales Coll Cardiff, Cardiff, Wales

来源：

NEUROCOMPUTING | 2025年 / 614卷

关键词：

Physics-informed; Open vocabulary; Segmentation; Patched alignment loss;

D O I：

10.1016/j.neucom.2024.128788

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Open vocabulary segmentation is a challenging task that aims to segment out the thousands of unseen categories. Directly applying CLIP to open-vocabulary semantic segmentation is challenging due to the granularity gap between its image-level contrastive learning and the pixel-level recognition required for segmentation. To address these challenges, we propose a unified pipeline that leverages physical structure regularization to enhance the generalizability and robustness of open vocabulary segmentation. By incorporating physical structure information, which is independent of the training data, we aim to reduce bias and improve the model's performance on unseen classes. We utilize low-level structures such as edges and keypoints as regularization terms, as they are easier to obtain and strongly correlated with segmentation boundary information. These structures are used as pseudo-ground truth to supervise the model. Furthermore, inspired by the effectiveness of comparative learning in human cognition, we introduce the weighted patched alignment loss. This loss function contrasts similar and dissimilar samples to acquire low-dimensional representations that capture the distinctions between different object classes. By incorporating physical knowledge and leveraging weighted patched alignment loss, we aim to improve the model's generalizability, robustness, and capability to recognize diverse object classes. The experiments on the COCO Stuff, Pascal VOC, Pascal Context-59, Pascal Context-459, ADE20K-150, and ADE20K-847 datasets demonstrate that our proposed method consistently improves baselines and achieves new state-of-the-art in the open vocabulary segmentation task.

引用

页数：10

共 3 条

[1] Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
Fang, Hao
Wu, Peng
Li, Yawei
Zhang, Xinxin
Lu, Xiankai
COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 225 - 241
[2] CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Ma, Chuofan
Jiang, Yi
Wen, Xin
Yuan, Zehuan
Qi, Xiaojuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] CLIP-TSA: CLIP-guided open-vocabulary semantic segmentation with two-level semantic awareness
Liang, Zhixue
Dong, Wenyong
Zhang, Bo
MULTIMEDIA SYSTEMS, 2025, 31 (01)

← 1 →