Physically-guided open vocabulary segmentation with weighted patched alignment loss

被引:1
|
作者
Liu, Weide [1 ]
Lou, Jieming [2 ]
Wang, Xingxing [3 ]
Zhou, Wei [4 ]
Cheng, Jun [3 ]
Yang, Xulei [3 ]
机构
[1] Harvard Med Sch, Boston, MA USA
[2] Natl Univ Singapore, Singapore, Singapore
[3] Agcy Sci Technol & Res, Inst Infocomm Res, Singapore, Singapore
[4] Univ Wales Coll Cardiff, Cardiff, Wales
关键词
Physics-informed; Open vocabulary; Segmentation; Patched alignment loss;
D O I
10.1016/j.neucom.2024.128788
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Open vocabulary segmentation is a challenging task that aims to segment out the thousands of unseen categories. Directly applying CLIP to open-vocabulary semantic segmentation is challenging due to the granularity gap between its image-level contrastive learning and the pixel-level recognition required for segmentation. To address these challenges, we propose a unified pipeline that leverages physical structure regularization to enhance the generalizability and robustness of open vocabulary segmentation. By incorporating physical structure information, which is independent of the training data, we aim to reduce bias and improve the model's performance on unseen classes. We utilize low-level structures such as edges and keypoints as regularization terms, as they are easier to obtain and strongly correlated with segmentation boundary information. These structures are used as pseudo-ground truth to supervise the model. Furthermore, inspired by the effectiveness of comparative learning in human cognition, we introduce the weighted patched alignment loss. This loss function contrasts similar and dissimilar samples to acquire low-dimensional representations that capture the distinctions between different object classes. By incorporating physical knowledge and leveraging weighted patched alignment loss, we aim to improve the model's generalizability, robustness, and capability to recognize diverse object classes. The experiments on the COCO Stuff, Pascal VOC, Pascal Context-59, Pascal Context-459, ADE20K-150, and ADE20K-847 datasets demonstrate that our proposed method consistently improves baselines and achieves new state-of-the-art in the open vocabulary segmentation task.
引用
收藏
页数:10
相关论文
共 3 条
  • [1] Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
    Fang, Hao
    Wu, Peng
    Li, Yawei
    Zhang, Xinxin
    Lu, Xiankai
    COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 225 - 241
  • [2] CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
    Ma, Chuofan
    Jiang, Yi
    Wen, Xin
    Yuan, Zehuan
    Qi, Xiaojuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] CLIP-TSA: CLIP-guided open-vocabulary semantic segmentation with two-level semantic awareness
    Liang, Zhixue
    Dong, Wenyong
    Zhang, Bo
    MULTIMEDIA SYSTEMS, 2025, 31 (01)