Discrete codebook collaborating with transformer for thangka image inpainting

被引：0

作者：

Bai, Jinxian ^{[1
]}

Fan, Yao ^{[1
]}

Zhao, Zhiwei ^{[1
]}

机构：

[1] Xizang Minzu Univ, Sch Informat Engn, Xianyang 712000, Shaanxi, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Image inpainting; Thangka images; Transformer; Cross-shaped window attention; Codebook;

D O I：

10.1007/s00530-024-01439-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Thangka, as a precious heritage of painting art, holds irreplaceable research value due to its richness in Tibetan history, religious beliefs, and folk culture. However, it is susceptible to partial damage and form distortion due to natural erosion or inadequate conservation measures. Given the complexity of textures and rich semantics in thangka images, existing image inpainting methods struggle to recover their original artistic style and intricate details. In this paper, we propose a novel approach combining discrete codebook learning with a transformer for image inpainting, tailored specifically for thangka images. In the codebook learning stage, we design an improved network framework based on vector quantization (VQ) codebooks to discretely encode intermediate features of input images, yielding a context-rich discrete codebook. The second phase introduces a parallel transformer module based on a cross-shaped window, which efficiently predicts the index combinations for missing regions under limited computational cost. Furthermore, we devise a multi-scale feature guidance module that progressively fuses features from intact areas with textural features from the codebook, thereby enhancing the preservation of local details in non-damaged regions. We validate the efficacy of our method through qualitative and quantitative experiments on datasets including Celeba-HQ, Places2, and a custom thangka dataset. Experimental results demonstrate that compared to previous methods, our approach successfully reconstructs images with more complete structural information and clearer textural details.

引用

页数：17

共 50 条

[1] Summarize the Thangka Image Inpainting Technology
Jia, Liang
INTERNATIONAL CONFERENCE ON ELECTRICAL AND CONTROL ENGINEERING (ICECE 2015), 2015, : 807 - 813
[2] A new quality assessment for Thangka image inpainting
Hu, Wenjin
Liu, Zhongmin
Ye, Yuqi
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (22):
[3] A new method of Thangka image inpainting quality assessment
Hu, Wenjin
Ye, Yuqi
Zeng, Fuliang
Meng, Jiahao
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 59 : 292 - 299
[4] Thangka image inpainting using adjacent information of broken area
Liu, Huaming
Wang, Weilan
Xie, Hui
IMECS 2008: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2008, : 646 - 649
[5] Damaged region filling by improved criminisi image inpainting algorithm for thangka
Fan Yao
Cluster Computing, 2019, 22 : 13683 - 13691
[6] A Survey on Thangka Image Inpainting Method Based on Structure-borne
Liu, Xiaojing
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MANAGEMENT INNOVATION, 2015, 28 : 1217 - 1223
[7] Damaged region filling by improved criminisi image inpainting algorithm for thangka
Yao, Fan
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6): : 13683 - 13691
[8] Thangka Image Inpainting Algorithm Based on Wavelet Transform and Structural Constraints
Yao, Fan
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1129 - 1144
[9] Transformer with Convolution for Irregular Image Inpainting
Xie, Donglin
Wang, Lingfeng
2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 35 - 38
[10] DLFormer: Discrete Latent Transformer for Video Inpainting
Ren, Jingjing
Zheng, Qingqing
Zhao, Yuanyuan
Xu, Xuemiao
Li, Chen
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3501 - 3510

← 1 2 3 4 5 →