Learning Visual Prior via Generative Pre-Training

被引：0

作者：

Xie, Jinheng ^{[1
]}

Ye, Kai ^{[2
]}

Li, Yudong ^{[2
]}

Li, Yuexiang ^{[3
]}

Lin, Kevin Qinghong ^{[1
]}

Zheng, Yefeng ^{[3
]}

Shen, Linlin ^{[2
]}

Shou, Mike Zheng ^{[1
]}

机构：

[1] Natl Univ Singapore, Show Lab, Singapore, Singapore

[2] Shenzhen Univ, Shenzhen, Peoples R China

[3] Tencent YouTu Lab, Jarvis Res Ctr, Shenzhen, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.

引用

页数：19

共 50 条

[21] RecGPT: Generative Pre-training for Text-based Recommendation
Mang Ngo
Dat Quoc Nguyen
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 302 - 313
[22] POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
Zhang, Yizhe
Wang, Guoyin
Li, Chunyuan
Gan, Zhe
Brockett, Chris
Dolan, Bill
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8649 - 8670
[23] Generative Table Pre-training Empowers Models for Tabular Prediction
Zhang, Tianping
Wang, Shaowen
Yale, Shuicheng
Liu, Jian
Liu, Qian
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14836 - 14854
[24] Self-supervised graph neural network with pre-training generative learning for recommendation systems
Min, Xin
Li, Wei
Yang, Jinzhao
Xie, Weidong
Zhao, Dazhe
SCIENTIFIC REPORTS, 2022, 12 (01)
[25] Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval
Tang, Yubao
Zhang, Ruqing
Guo, Jiafeng
de Rijke, Maarten
Fan, Yixing
Cheng, Xueqi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10303 - 10317
[26] Self-supervised graph neural network with pre-training generative learning for recommendation systems
Xin Min
Wei Li
Jinzhao Yang
Weidong Xie
Dazhe Zhao
Scientific Reports, 12
[27] Pre-training the deep generative models with adaptive hyperparameter optimization
Yao, Chengwei
Cai, Deng
Bu, Jiajun
Chen, Gencai
NEUROCOMPUTING, 2017, 247 : 144 - 155
[28] Learning Transferable User Representations with Sequential Behaviors via Contrastive Pre-training
Cheng, Mingyue
Yuan, Fajie
Liu, Qi
Xin, Xin
Chen, Enhong
2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 51 - 60
[29] Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-training Transformer
Liu, Zhiwei
Fan, Ziwei
Wang, Yu
Yu, Philip S.
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1608 - 1612
[30] Learning to Sample Replacements for ELECTRA Pre-Training
Hao, Yaru
Dong, Li
Bao, Hangbo
Xu, Ke
Wei, Furu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4495 - 4506

← 1 2 3 4 5 →