Learning Scene Graph for Better Cross-Domain Image Captioning

被引:0
|
作者
Jia, Junhua [1 ]
Xin, Xiaowei [1 ]
Gao, Xiaoyan [1 ]
Ding, Xiangqian [1 ]
Pang, Shunpeng [2 ]
机构
[1] Ocean Univ China, Fac Informat Sci & Engn, Shandong 266000, Peoples R China
[2] Weifang Univ, Sch Comp Engn, Shandong 261061, Peoples R China
关键词
Image Captioning; Scene Graph; Text-to-Image Synthesis; Dual Learning;
D O I
10.1007/978-981-99-8435-0_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current image captioning (IC) methods achieve good results within a single domain primarily due to training on a large amount of annotated data. However, the performance of single-domain image captioning methods suffers when extended to new domains. To address this, we propose a cross-domain image captioning framework, called SGCDIC, which achieves cross-domain generalization of image captioning models by simultaneously optimizing two coupled tasks, i.e., image captioning and text-to-image synthesis (TIS). Specifically, we propose a scene-graph-based approach SGAT for image captioning tasks. The image synthesis task employs a GAN variant (DFGAN) to synthesize plausible images based on the generated text descriptions by SGAT. We compare the generated images with the real images to enhance the image captioning performance in new domains. We conduct extensive experiments to evaluate the performance of SGCDIC by using the MSCOCO as the source domain data, and using Flickr30k and Oxford-102 as the new domain data. Sufficient comparative experiments and ablation studies demonstrate that SGCDIC achieves substantially better performance than the strong competitors for the cross-domain image captioning task.
引用
收藏
页码:121 / 137
页数:17
相关论文
共 50 条
  • [41] Feature Representation Learning for Unsupervised Cross-Domain Image Retrieval
    Hu, Conghui
    Lee, Gim Hee
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 529 - 544
  • [42] Social image annotation via cross-domain subspace learning
    Si Si
    Dacheng Tao
    Meng Wang
    Kwok-Ping Chan
    Multimedia Tools and Applications, 2012, 56 : 91 - 108
  • [43] Cross-Domain Traffic Scene Understanding by Integrating Deep Learning and Topic Model
    Yang, Yuanfeng
    Dong, Husheng
    Liu, Gang
    Zhang, Liang
    Li, Lin
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [44] Cross-Domain Scene Text Detection via Pixel and Image-Level Adaptation
    Chen, Danlu
    Lu, Lihua
    Lu, Yao
    Yu, Ruizhe
    Wang, Shunzhou
    Zhang, Lin
    Liu, Tingxi
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 135 - 143
  • [45] Scene graph sorting and shuffle polishing based controllable image captioning
    Wu, Guichang
    Zhao, Qian
    Liu, Xiushu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (04)
  • [46] Hierarchical Scene Graph Encoder-Decoder for Image Paragraph Captioning
    Yang, Xu
    Gao, Chongyang
    Zhang, Hanwang
    Cai, Jianfei
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4181 - 4189
  • [47] Scene graph captioner: Image captioning based on structural visual representation
    Xu, Ning
    Liu, An-An
    Liu, Jing
    Nie, Weizhi
    Su, Yuting
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 58 : 477 - 485
  • [48] Exploring the Cross-Domain Action Recognition Problem by Deep Feature Learning and Cross-Domain Learning
    Gao, Zan
    Han, T. T.
    Zhu, Lei
    Zhang, Hua
    Wang, Yinglong
    IEEE ACCESS, 2018, 6 : 68989 - 69008
  • [49] Graph Optimal Transport for Cross-Domain Alignment
    Chen, Liqun
    Gan, Zhe
    Cheng, Yu
    Li, Linjie
    Carin, Lawrence
    Liu, Jingjing
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [50] Cross-domain Web Image Annotation
    Si, Si
    Tao, Dacheng
    Chan, Kwok-Ping
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 184 - +