SEMACOL: Semantic-enhanced multi-scale approach for text-guided grayscale image colorization

被引:0
|
作者
Niu, Chaochao [1 ]
Tao, Ming [1 ]
Bao, Bing-Kun [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, 66 New Mofan Rd, Nanjing 210003, Jiangsu, Peoples R China
关键词
Text-guided image colorization; Cross-modal semantic enhancement; Multi-scale features; SKETCH;
D O I
10.1016/j.patcog.2024.111203
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-quality colorization of grayscale images using text descriptions presents a significant challenge, especially inaccurately coloring small objects. The existing methods have two major flaws. First, text descriptions typically omit size information of objects, resulting in text features that often lack semantic information reflecting object sizes. Second, these methods identify coloring areas by relying solely on low-resolution visual features from the Unet encoder and fail to leverage the fine-grained information provided by high- resolution visual features effectively. To address these issues, we introduce the Semantic-Enhanced Multi-scale Approach for Text-Guided Grayscale Image Colorization (SEMACOL). We first introduce a Cross-Modal Text Augmentation module that incorporates grayscale images into text features, which enables accurate perception of object sizes in text descriptions. Subsequently, we propose a Multi-scale Content Location module, which utilizes multi-scale features to precisely identify coloring areas within grayscale images. Meanwhile, we incorporate a Text-Influenced Colorization Adjustment module to effectively adjust colorization based on text descriptions. Finally, we implement a Dynamic Feature Fusion Strategy, which dynamically refines outputs from both the Multi-scale Content Location and Text-Influenced Colorization Adjustment modules, ensuring a coherent colorization process. SEMACOL demonstrates remarkable performance improvements over existing state-of-the-art methods on public datasets. Specifically, SEMACOL achieves a PSNR of 25.695, SSIM of 0.92240, LPIPS of 0.156, and FID of 17.54, surpassing the previous best results (PSNR: 25.511, SSIM: 0.92104, LPIPS: 0.157, FID: 26.93). The code will be available at https://github.com/ChchNiu/SEMACOL.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Multi-scale semantic image inpainting with residual learning and GAN
    Jiao, Libin
    Wu, Hao
    Wang, Haodi
    Bie, Rongfang
    NEUROCOMPUTING, 2019, 331 : 199 - 212
  • [42] Deep semantic space guided multi-scale neural style transfer
    Yu, Jiachen
    Jin, Li
    Chen, Jiayi
    Xiao, Youzi
    Tian, Zhiqiang
    Lan, Xuguang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 3915 - 3938
  • [43] Deep semantic space guided multi-scale neural style transfer
    Jiachen Yu
    Li Jin
    Jiayi Chen
    Youzi Xiao
    Zhiqiang Tian
    Xuguang Lan
    Multimedia Tools and Applications, 2022, 81 : 3915 - 3938
  • [44] Semantic-guided multi-scale human skeleton action recognition
    Yongfeng Qi
    Jinlin Hu
    Liqiang Zhuang
    Xiaoxu Pei
    Applied Intelligence, 2023, 53 : 9763 - 9778
  • [45] Semantic-guided multi-scale human skeleton action recognition
    Qi, Yongfeng
    Hu, Jinlin
    Zhuang, Liqiang
    Pei, Xiaoxu
    APPLIED INTELLIGENCE, 2023, 53 (09) : 9763 - 9778
  • [46] Fast Image Dehazing Based on Multi-Scale Guided Filtering
    Thuong Van Nguyen
    An Gia Vien
    Lee, Chul
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGE TECHNOLOGY (IWAIT) 2019, 2019, 11049
  • [47] Image Dehazing Based on Multi-scale Retinex and Guided Filtering
    Gao, Zhihui
    Zhai, Yishu
    2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 123 - 126
  • [48] An aerial image segmentation approach based on enhanced multi-scale convolutional neural network
    Li, Xiang
    Jiang, Yuchen
    Peng, Hu
    Yin, Shen
    2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER PHYSICAL SYSTEMS (ICPS 2019), 2019, : 47 - 52
  • [49] Multi-Scale Feature Fusion with Attention Mechanism Based on CGAN Network for Infrared Image Colorization
    Ai, Yibo
    Liu, Xiaoxi
    Zhai, Haoyang
    Li, Jie
    Liu, Shuangli
    An, Huilong
    Zhang, Weidong
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [50] Remote sensing image colorization using symmetrical multi-scale DCGAN in YUV color space
    Wu, Min
    Jin, Xin
    Jiang, Qian
    Lee, Shin-jye
    Liang, Wentao
    Lin, Guo
    Yao, Shaowen
    VISUAL COMPUTER, 2021, 37 (07): : 1707 - 1729