SEMACOL: Semantic-enhanced multi-scale approach for text-guided grayscale image colorization

被引:0
|
作者
Niu, Chaochao [1 ]
Tao, Ming [1 ]
Bao, Bing-Kun [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, 66 New Mofan Rd, Nanjing 210003, Jiangsu, Peoples R China
关键词
Text-guided image colorization; Cross-modal semantic enhancement; Multi-scale features; SKETCH;
D O I
10.1016/j.patcog.2024.111203
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-quality colorization of grayscale images using text descriptions presents a significant challenge, especially inaccurately coloring small objects. The existing methods have two major flaws. First, text descriptions typically omit size information of objects, resulting in text features that often lack semantic information reflecting object sizes. Second, these methods identify coloring areas by relying solely on low-resolution visual features from the Unet encoder and fail to leverage the fine-grained information provided by high- resolution visual features effectively. To address these issues, we introduce the Semantic-Enhanced Multi-scale Approach for Text-Guided Grayscale Image Colorization (SEMACOL). We first introduce a Cross-Modal Text Augmentation module that incorporates grayscale images into text features, which enables accurate perception of object sizes in text descriptions. Subsequently, we propose a Multi-scale Content Location module, which utilizes multi-scale features to precisely identify coloring areas within grayscale images. Meanwhile, we incorporate a Text-Influenced Colorization Adjustment module to effectively adjust colorization based on text descriptions. Finally, we implement a Dynamic Feature Fusion Strategy, which dynamically refines outputs from both the Multi-scale Content Location and Text-Influenced Colorization Adjustment modules, ensuring a coherent colorization process. SEMACOL demonstrates remarkable performance improvements over existing state-of-the-art methods on public datasets. Specifically, SEMACOL achieves a PSNR of 25.695, SSIM of 0.92240, LPIPS of 0.156, and FID of 17.54, surpassing the previous best results (PSNR: 25.511, SSIM: 0.92104, LPIPS: 0.157, FID: 26.93). The code will be available at https://github.com/ChchNiu/SEMACOL.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] TIC: text-guided image colorization using conditional generative model
    Ghosh, Subhankar
    Roy, Prasun
    Bhattacharya, Saumik
    Pal, Umapada
    Blumenstein, Michael
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 41121 - 41136
  • [2] TIC: text-guided image colorization using conditional generative model
    Subhankar Ghosh
    Prasun Roy
    Saumik Bhattacharya
    Umapada Pal
    Michael Blumenstein
    Multimedia Tools and Applications, 2024, 83 : 41121 - 41136
  • [3] Learning Multi-Scale Knowledge-Guided Features for Text-Guided Face Recognition
    Hasan, Md Mahedi
    Sami, Shoaib Meraj
    Nasrabadi, Nasser M.
    Dawson, Jeremy
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2025, 7 (02): : 195 - 209
  • [4] MISL: Multi-grained image-text semantic learning for text-guided image inpainting
    Wu, Xingcai
    Zhao, Kejun
    Huang, Qianding
    Wang, Qi
    Yang, Zhenguo
    Hao, Gefei
    PATTERN RECOGNITION, 2024, 145
  • [5] Learning semantic alignment from image for text-guided image inpainting
    Yucheng Xie
    Zehang Lin
    Zhenguo Yang
    Huan Deng
    Xingcai Wu
    Xudong Mao
    Qing Li
    Wenyin Liu
    The Visual Computer, 2022, 38 : 3149 - 3161
  • [6] Learning semantic alignment from image for text-guided image inpainting
    Xie, Yucheng
    Lin, Zehang
    Yang, Zhenguo
    Deng, Huan
    Wu, Xingcai
    Mao, Xudong
    Li, Qing
    Liu, Wenyin
    VISUAL COMPUTER, 2022, 38 (9-10): : 3149 - 3161
  • [7] Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval
    Liu, Delong
    Li, Haiwen
    Zhao, Zhicheng
    Dong, Yuan
    NEURAL NETWORKS, 2025, 184
  • [8] Semantic-Enhanced Attention Network for Image-Text Matching
    Zhou, Huanxiao
    Geng, Yushui
    Zhao, Jing
    Ma, Xishan
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1256 - 1261
  • [9] Enhanced Text-Guided Attention Model for Image Captioning
    Zhou, Yuanen
    Hu, Zhenzhen
    Zhao, Ye
    Liu, Xueliang
    Hong, Richang
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [10] STACKED MULTI-SCALE ATTENTION NETWORK FOR IMAGE COLORIZATION
    Jiang, Bin
    Xu, Fangqiang
    Xia, Jun
    Yang, Chao
    Huang, Wei
    Huang, Yun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2225 - 2229