Semantic-Aware Data Augmentation for Text-to-Image Synthesis

被引:0
|
作者
Tan, Zhaorui [1 ,2 ]
Yang, Xi [1 ]
Huang, Kaizhu [3 ]
机构
[1] Xian Jiaotong Liverpool Univ, Dept Intelligent Sci, Suzhou, Peoples R China
[2] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England
[3] Duke Kunshan Univ, Data Sci Res Ctr, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data augmentation has been recently leveraged as an effective regularizer in various vision-language deep neural networks. However, in text-to-image synthesis (T2Isyn), current augmentation wisdom still suffers from the semantic mismatch between augmented paired data. Even worse, semantic collapse may occur when generated images are less semantically constrained. In this paper, we develop a novel Semantic-aware Data Augmentation (SADA) framework dedicated to T2Isyn. In particular, we propose to augment texts in the semantic space via an Implicit Textual Semantic Preserving Augmentation, in conjunction with a specifically designed Image Semantic Regularization Loss as Generated Image Semantic Conservation, to cope well with semantic mismatch and collapse. As one major contribution, we theoretically show that Implicit Textual Semantic Preserving Augmentation can certify better text-image consistency while Image Semantic Regularization Loss regularizing the semantics of generated images would avoid semantic collapse and enhance image quality. Extensive experiments validate that SADA enhances text-image consistency and improves image quality significantly in T2Isyn models across various backbones. Especially, incorporating SADA during the tuning process of Stable Diffusion models also yields performance improvements.
引用
收藏
页码:5098 / 5107
页数:10
相关论文
共 50 条
  • [31] Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
    Zhang, Yang
    Tzun, Teoh Tze
    Hern, Lim Wei
    Kawaguchi, Kenji
    COMPUTER VISION - ECCV 2024, PT LXXXVI, 2025, 15144 : 70 - 86
  • [32] Deep semantic-aware remote sensing image deblurring
    Song, Zhenbo
    Zhang, Zhenyuan
    Fang, Feiyi
    Fan, Zhaoxin
    Lu, Jianfeng
    SIGNAL PROCESSING, 2023, 211
  • [33] Semantic-Aware Co-Indexing for Image Retrieval
    Zhang, Shiliang
    Yang, Ming
    Wang, Xiaoyu
    Lin, Yuanqing
    Tian, Qi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (12) : 2573 - 2587
  • [34] Semantic-aware Resource Allocation for Wireless Image Transmission
    Han, Xue
    Feng, Biqian
    Shi, Yuxuan
    Wu, Yongpeng
    Zhang, Wenjun
    2024 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA, ICCC, 2024,
  • [35] A Semantic-Aware Detail Adaptive Network for Image Enhancement
    Fan, Linlin
    Wei, Xuekai
    Zhou, Mingliang
    Yan, Jielu
    Pu, Huayan
    Luo, Jun
    Li, Zhengguo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1787 - 1800
  • [36] Semantic-aware Co-indexing for Image Retrieval
    Zhang, Shiliang
    Yang, Ming
    Wang, Xiaoyu
    Lin, Yuanqing
    Tian, Qi
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 1673 - 1680
  • [37] Semantic-Aware Generator and Low-level Feature Augmentation for Few-shot Image Generation
    Wang, Zhe
    Guan, Jiaoyan
    Yang, Mengping
    Xiao, Ting
    Chi, Ziqiu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5079 - 5088
  • [38] Mobile App for Text-to-Image Synthesis
    Kang, Ryan
    Sunil, Athira
    Chen, Min
    MOBILE COMPUTING, APPLICATIONS, AND SERVICES, MOBICASE 2019, 2019, 290 : 32 - 43
  • [39] Adversarial text-to-image synthesis: A review
    Frolov, Stanislav
    Hinz, Tobias
    Raue, Federico
    Hees, Joern
    Dengel, Andreas
    NEURAL NETWORKS, 2021, 144 : 187 - 209
  • [40] Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval
    Liu, Delong
    Li, Haiwen
    Zhao, Zhicheng
    Dong, Yuan
    NEURAL NETWORKS, 2025, 184