SEMACOL: Semantic-enhanced multi-scale approach for text-guided grayscale image colorization

被引:0
|
作者
Niu, Chaochao [1 ]
Tao, Ming [1 ]
Bao, Bing-Kun [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, 66 New Mofan Rd, Nanjing 210003, Jiangsu, Peoples R China
关键词
Text-guided image colorization; Cross-modal semantic enhancement; Multi-scale features; SKETCH;
D O I
10.1016/j.patcog.2024.111203
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-quality colorization of grayscale images using text descriptions presents a significant challenge, especially inaccurately coloring small objects. The existing methods have two major flaws. First, text descriptions typically omit size information of objects, resulting in text features that often lack semantic information reflecting object sizes. Second, these methods identify coloring areas by relying solely on low-resolution visual features from the Unet encoder and fail to leverage the fine-grained information provided by high- resolution visual features effectively. To address these issues, we introduce the Semantic-Enhanced Multi-scale Approach for Text-Guided Grayscale Image Colorization (SEMACOL). We first introduce a Cross-Modal Text Augmentation module that incorporates grayscale images into text features, which enables accurate perception of object sizes in text descriptions. Subsequently, we propose a Multi-scale Content Location module, which utilizes multi-scale features to precisely identify coloring areas within grayscale images. Meanwhile, we incorporate a Text-Influenced Colorization Adjustment module to effectively adjust colorization based on text descriptions. Finally, we implement a Dynamic Feature Fusion Strategy, which dynamically refines outputs from both the Multi-scale Content Location and Text-Influenced Colorization Adjustment modules, ensuring a coherent colorization process. SEMACOL demonstrates remarkable performance improvements over existing state-of-the-art methods on public datasets. Specifically, SEMACOL achieves a PSNR of 25.695, SSIM of 0.92240, LPIPS of 0.156, and FID of 17.54, surpassing the previous best results (PSNR: 25.511, SSIM: 0.92104, LPIPS: 0.157, FID: 26.93). The code will be available at https://github.com/ChchNiu/SEMACOL.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Multi-scale sequential network for semantic text segmentation and localization
    Villamizar, Michael
    Canevet, Olivier
    Odobez, Jean-Marc
    PATTERN RECOGNITION LETTERS, 2020, 129 : 63 - 69
  • [32] SGUIE-Net: Semantic Attention Guided Underwater Image Enhancement With Multi-Scale Perception
    Qi, Qi
    Li, Kunqian
    Zheng, Haiyong
    Gao, Xiang
    Hou, Guojia
    Sun, Kun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6816 - 6830
  • [33] Multi-scale Autoencoders in Autoencoder for Semantic Image Segmentation
    Yusiong, John Paul T.
    Naval, Prospero C., Jr.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT I, 2019, 11431 : 587 - 599
  • [34] MASER: Multi-Order Attention and Semantic-Enhanced Representation Model for Complex Text Recommendation
    Lai, Pei-Yuan
    Dai, Qing-Yun
    Liao, De-Zhang
    Yang, Zhe-Rui
    Liao, Xiao-Dong
    Wang, Chang-Dong
    Chen, Min
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [35] Boundary-Guided Lightweight Semantic Segmentation With Multi-Scale Semantic Context
    Zhou, Quan
    Wang, Linjie
    Gao, Guangwei
    Kang, Bin
    Ou, Weihua
    Lu, Huimin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7887 - 7900
  • [36] Multi-scale Similarity Enhanced Guided Normal Filtering
    Zhao, Wenbo
    Liu, Xianming
    Wang, Shiqi
    Zhao, Debin
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 645 - 653
  • [37] Scene Text Image Super-Resolution Through Multi-Scale Interaction of Structural and Semantic Priors
    Zhu Z.
    Zhang L.
    Bai Y.
    Wang Y.
    Li P.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (07): : 1 - 11
  • [38] GMAlignNet: multi-scale lightweight brain tumor image segmentation with enhanced semantic information consistency
    Song, Jianli
    Lu, Xiaoqi
    Gu, Yu
    PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (11):
  • [39] Deep Neural Network Joint Multi-Scale Attention for Remote Sensing Image Colorization
    Wang, Yun
    Jiang, Qian
    Jin, Xin
    Lee, Shin-Jye
    Feng, Jianan
    Zhou, Ding
    Zhang, Ya
    THIRTEENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2021), 2021, 11878
  • [40] Multi-Scale Image Semantic Recognition with Hierarchical Visual Vocabulary
    Jiang, Xinghao
    Sun, Tanfeng
    Fu, GuangLei
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2011, 8 (03) : 931 - 951