SEMACOL: Semantic-enhanced multi-scale approach for text-guided grayscale image colorization

被引:0
|
作者
Niu, Chaochao [1 ]
Tao, Ming [1 ]
Bao, Bing-Kun [1 ]
机构
[1] Nanjing Univ Posts & Telecommun, 66 New Mofan Rd, Nanjing 210003, Jiangsu, Peoples R China
关键词
Text-guided image colorization; Cross-modal semantic enhancement; Multi-scale features; SKETCH;
D O I
10.1016/j.patcog.2024.111203
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-quality colorization of grayscale images using text descriptions presents a significant challenge, especially inaccurately coloring small objects. The existing methods have two major flaws. First, text descriptions typically omit size information of objects, resulting in text features that often lack semantic information reflecting object sizes. Second, these methods identify coloring areas by relying solely on low-resolution visual features from the Unet encoder and fail to leverage the fine-grained information provided by high- resolution visual features effectively. To address these issues, we introduce the Semantic-Enhanced Multi-scale Approach for Text-Guided Grayscale Image Colorization (SEMACOL). We first introduce a Cross-Modal Text Augmentation module that incorporates grayscale images into text features, which enables accurate perception of object sizes in text descriptions. Subsequently, we propose a Multi-scale Content Location module, which utilizes multi-scale features to precisely identify coloring areas within grayscale images. Meanwhile, we incorporate a Text-Influenced Colorization Adjustment module to effectively adjust colorization based on text descriptions. Finally, we implement a Dynamic Feature Fusion Strategy, which dynamically refines outputs from both the Multi-scale Content Location and Text-Influenced Colorization Adjustment modules, ensuring a coherent colorization process. SEMACOL demonstrates remarkable performance improvements over existing state-of-the-art methods on public datasets. Specifically, SEMACOL achieves a PSNR of 25.695, SSIM of 0.92240, LPIPS of 0.156, and FID of 17.54, surpassing the previous best results (PSNR: 25.511, SSIM: 0.92104, LPIPS: 0.157, FID: 26.93). The code will be available at https://github.com/ChchNiu/SEMACOL.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Semantic-Enhanced Proxy-Guided Hashing for Long-Tailed Image Retrieval
    Xie, Hongtao
    Jiang, Yan
    Zhang, Lei
    Li, Pandeng
    Zhang, Dongming
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9499 - 9514
  • [22] CONSISTENT AND MULTI-SCALE SCENE GRAPH TRANSFORMER FOR SEMANTIC-GUIDED IMAGE OUTPAINTING
    Yang, Chiao-An
    Wu, Meng-Lin
    Yeh, Raymond A.
    Wang, Yu-Chiang Frank
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 176 - 180
  • [23] Semantic attention guided low-light image enhancement with multi-scale perception
    Hou, Yongqi
    Yang, Bo
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
  • [24] A Novel Semantic-Enhanced Text Graph Representation Learning Approach through Transformer Paradigm
    Vo, Tham
    CYBERNETICS AND SYSTEMS, 2023, 54 (04) : 499 - 525
  • [25] StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation
    Kocasari, Umut
    Dirik, Alara
    Tiftikci, Mert
    Yanardag, Pinar
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 3441 - 3450
  • [26] Semantic-Enhanced Graph Convolutional Neural Networks for Multi-Scale Urban Functional-Feature Identification Based on Human Mobility
    Chen, Yuting
    Zhao, Pengjun
    Lin, Yi
    Sun, Yushi
    Chen, Rui
    Yu, Ling
    Liu, Yu
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2024, 13 (01)
  • [27] REFERENCE-BASED VIDEO COLORIZATION WITH MULTI-SCALE SEMANTIC FUSION AND TEMPORAL AUGMENTATION
    Liu, Yaxin
    Zhang, Xiaoyan
    Xu, Xiaogang
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1924 - 1928
  • [28] Text-Guided Multi-region Scene Image Editing Based on Diffusion Model
    Li, Ruichen
    Wu, Lei
    Wang, Changshuo
    Dong, Pei
    Li, Xin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XI, ICIC 2024, 2024, 14872 : 229 - 240
  • [29] MFEAFN: Multi-scale feature enhanced adaptive fusion network for image semantic segmentation
    Li, Shusheng
    Wan, Liang
    Tang, Lu
    Zhang, Zhining
    Seal, Ayan
    PLOS ONE, 2022, 17 (09):
  • [30] AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval
    Zhu, Hongguang
    Wei, Yunchao
    Zhao, Yao
    Zhang, Chunjie
    Huang, Shujuan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)