Semantic-Aware Data Augmentation for Text-to-Image Synthesis

被引：0

作者：

Tan, Zhaorui ^{[1
,2
]}

Yang, Xi ^{[1
]}

Huang, Kaizhu ^{[3
]}

机构：

[1] Xian Jiaotong Liverpool Univ, Dept Intelligent Sci, Suzhou, Peoples R China

[2] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England

[3] Duke Kunshan Univ, Data Sci Res Ctr, Suzhou, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data augmentation has been recently leveraged as an effective regularizer in various vision-language deep neural networks. However, in text-to-image synthesis (T2Isyn), current augmentation wisdom still suffers from the semantic mismatch between augmented paired data. Even worse, semantic collapse may occur when generated images are less semantically constrained. In this paper, we develop a novel Semantic-aware Data Augmentation (SADA) framework dedicated to T2Isyn. In particular, we propose to augment texts in the semantic space via an Implicit Textual Semantic Preserving Augmentation, in conjunction with a specifically designed Image Semantic Regularization Loss as Generated Image Semantic Conservation, to cope well with semantic mismatch and collapse. As one major contribution, we theoretically show that Implicit Textual Semantic Preserving Augmentation can certify better text-image consistency while Image Semantic Regularization Loss regularizing the semantics of generated images would avoid semantic collapse and enhance image quality. Extensive experiments validate that SADA enhances text-image consistency and improves image quality significantly in T2Isyn models across various backbones. Especially, incorporating SADA during the tuning process of Stable Diffusion models also yields performance improvements.

引用

页码：5098 / 5107

页数：10

共 50 条

[41] Semantic-Aware Noise Driven Portrait Synthesis and Manipulation
Deng, Qiyao
Li, Qi
Cao, Jie
Liu, Yunfan
Sun, Zhenan
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2799 - 2811
[42] Language-vision matching for text-to-image synthesis with context-aware GAN
Hou, Yingli
Zhang, Wei
Zhu, Zhiliang
Yu, Hai
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[43] FedPAM: Federated Personalized Augmentation Model for Text-to-Image Retrieval
Feng, Yueying
Ma, Fan
Lin, Wang
Yao, Chang
Chen, Jingyuan
Yang, Yi
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1185 - 1189
[44] SF-GAN: Semantic fusion generative adversarial networks for text-to-image synthesis
Yang, Bing
Xiang, Xueqin
Kong, Wanzeng
Zhang, Jianhai
Yao, Jinliang
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262
[45] Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis
Tan, Hongchen
Liu, Xiuping
Yin, Baocai
Li, Xin
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 832 - 845
[46] Semantic-Aware Speech-to-Text Transmission Over MIMO Channels
Weng, Zhenzi
Qin, Zhijin
Tao, Xiaoming
2023 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS, 2023, : 1362 - 1367
[47] Semantic-aware and QoS-aware image caching in ad hoc networks
Yang, Bo
Hurson, Ali R.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (12) : 1694 - 1707
[48] DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis
Ruan, Shulan
Zhang, Yong
Zhang, Kun
Fan, Yanbo
Tang, Fan
Liu, Qi
Chen, Enhong
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13940 - 13949
[49] Semantic-aware visual consistency network for fused image harmonisation
Yu, Huayan
Huang, Hai
Zhu, Yueyan
Chen, Aoran
IET SIGNAL PROCESSING, 2023, 17 (06)
[50] Multi-level semantic-aware transformer for image captioning
Xu, Qin
Song, Shan
Wu, Qihang
Jiang, Bo
Luo, Bin
Tang, Jinhui
NEURAL NETWORKS, 2025, 187

← 1 2 3 4 5 →