Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation

被引:0
|
作者
Quang Nguyen [1 ,2 ]
Truong Vu [1 ]
Anh Tran [1 ]
Khoi Nguyen [1 ]
机构
[1] VinAI Res, Ho Chi Minh City, Vietnam
[2] Ho Chi Minh City Univ Technol, VNU HCM, Ho Chi Minh City, Vietnam
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Preparing training data for deep vision models is a labor-intensive task. To address this, generative models have emerged as an effective solution for generating synthetic data. While current generative models produce image-level category labels, we propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion (SD). By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation. These techniques enable us to generate segmentation maps corresponding to synthetic images. These maps serve as pseudo-labels for training semantic segmenters, eliminating the need for labor-intensive pixel-wise annotation. To account for the imperfections in our pseudo-labels, we incorporate uncertainty regions into the segmentation, allowing us to disregard loss from those regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO, and our approach significantly outperforms concurrent work. Our benchmarks and code will be released at https://github.com/VinAIResearch/Dataset-Diffusion.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] PATCH-BASED FEATURE MAPS FOR PIXEL-LEVEL IMAGE SEGMENTATION
    Cao, Shuoying
    Iftikhar, Saadia
    Bharath, Anil Anthony
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 2263 - 2267
  • [42] Video segmentation for traffic monitoring tasks based on pixel-level snakes
    Vilariño, DL
    Cabello, D
    Pardo, XM
    Brea, VM
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PROCEEDINGS, 2003, 2652 : 1074 - 1081
  • [43] Pixel level Image Encryption Based on Semantic Segmentation
    Shan, Yufu
    He, Muyang
    Yu, Ziyuan
    Wu, Haolun
    2018 INTERNATIONAL CONFERENCE ON CONTROL, ARTIFICIAL INTELLIGENCE, ROBOTICS & OPTIMIZATION (ICCAIRO), 2018, : 147 - 152
  • [44] Diffusion-based Human Motion Style Transfer with Semantic Guidance
    Hu, Lei
    Zhang, Zihao
    Ye, Yongjing
    Xu, Yiwen
    Xia, Shihong
    ACM SIGGRAPH / EUROGRAPHICS SYMPOSIUM OF COMPUTER ANIMATION 2024, 2024,
  • [45] Class-Incremental Semantic Segmentation of Aerial Images via Pixel-Level Feature Generation and Task-Wise Distillation
    Shan, Lianlei
    Wang, Weiqiang
    Lv, Ke
    Luo, Bin
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [46] Diffusion-based Human Motion Style Transfer with Semantic Guidance
    Hu, Lei
    Zhang, Zihao
    Ye, Yongjing
    Xu, Yiwen
    Xia, Shihong
    COMPUTER GRAPHICS FORUM, 2024, 43 (08)
  • [47] Diffusion-Based Semantic Image Synthesis from Sparse Layouts
    Huang, Yuantian
    Iizuka, Satoshi
    Fukui, Kazuhiro
    ADVANCES IN COMPUTER GRAPHICS, CGI 2023, PT II, 2024, 14496 : 441 - 454
  • [48] Using synthetic dataset for semantic segmentation of the human body in the problem of extracting anthropometric data
    Absadyk, Azat
    Turar, Olzhas
    Akhmed-Zaki, Darkhan
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [49] Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation
    Li, Ruihuang
    Li, Shuai
    He, Chenhang
    Zhang, Yabin
    Jia, Xu
    Zhang, Lei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11583 - 11593
  • [50] A Pixel-Level Meta-Learner for Weakly Supervised Few-Shot Semantic Segmentation
    Lee, Yuan-Hao
    Yang, Fu-En
    Wang, Yu-Chiang Frank
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 1607 - 1617