Discrete codebook collaborating with transformer for thangka image inpainting

被引：0

作者：

Bai, Jinxian ^{[1
]}

Fan, Yao ^{[1
]}

Zhao, Zhiwei ^{[1
]}

机构：

[1] Xizang Minzu Univ, Sch Informat Engn, Xianyang 712000, Shaanxi, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Image inpainting; Thangka images; Transformer; Cross-shaped window attention; Codebook;

D O I：

10.1007/s00530-024-01439-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Thangka, as a precious heritage of painting art, holds irreplaceable research value due to its richness in Tibetan history, religious beliefs, and folk culture. However, it is susceptible to partial damage and form distortion due to natural erosion or inadequate conservation measures. Given the complexity of textures and rich semantics in thangka images, existing image inpainting methods struggle to recover their original artistic style and intricate details. In this paper, we propose a novel approach combining discrete codebook learning with a transformer for image inpainting, tailored specifically for thangka images. In the codebook learning stage, we design an improved network framework based on vector quantization (VQ) codebooks to discretely encode intermediate features of input images, yielding a context-rich discrete codebook. The second phase introduces a parallel transformer module based on a cross-shaped window, which efficiently predicts the index combinations for missing regions under limited computational cost. Furthermore, we devise a multi-scale feature guidance module that progressively fuses features from intact areas with textural features from the codebook, thereby enhancing the preservation of local details in non-damaged regions. We validate the efficacy of our method through qualitative and quantitative experiments on datasets including Celeba-HQ, Places2, and a custom thangka dataset. Experimental results demonstrate that compared to previous methods, our approach successfully reconstructs images with more complete structural information and clearer textural details.

引用

页数：17

共 50 条

[11] Continuously Masked Transformer for Image Inpainting
Ko, Keunsoo
Kim, Chang-Su
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13123 - 13132
[12] DLFormer: Discrete Latent Transformer for Video Inpainting
School of Computer Science and Engineering, South China University of Technology, China
不详
不详
Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, 1600, (3501-3510):
[13] No -Reference Quality Assessment Method for Inpainting Thangka Image Based on Multiple Features
Ye Yuqi
Hu Wenjin
LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (08)
[14] Damaged region filling and evaluation by symmetrical exemplar-based image inpainting for Thangka
Weilan Wang
Yanjun Jia
EURASIP Journal on Image and Video Processing, 2017
[15] Damaged region filling and evaluation by symmetrical exemplar-based image inpainting for Thangka
Wang, Weilan
Jia, Yanjun
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
[16] Bidirectional interaction of CNN and Transformer for image inpainting
Liu, Jialu
Gong, Maoguo
Gao, Yuan
Lu, Yiheng
Li, Hao
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[17] A transformer–CNN for deep image inpainting forensics
Xinshan Zhu
Junyan Lu
Honghao Ren
Hongquan Wang
Biao Sun
The Visual Computer, 2023, 39 : 4721 - 4735
[18] IMAGE INPAINTING BY MSCSWIN TRANSFORMER ADVERSARIAL AUTOENCODER
Chen, Bo-Wei
Liu, Tsung-Jung
Liu, Kuan-Hsien
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2040 - 2044
[19] TSFormer: Tracking Structure Transformer for Image Inpainting
Lin, Jiayu
Wang, Yuan-gen
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (12)
[20] Edge-Guided Image Inpainting with Transformer
Liang, Huining
Kambhamettu, Chandra
ADVANCES IN VISUAL COMPUTING, ISVC 2023, PT II, 2023, 14362 : 285 - 296

← 1 2 3 4 5 →