Learning Visual Prior via Generative Pre-Training

被引:0
|
作者
Xie, Jinheng [1 ]
Ye, Kai [2 ]
Li, Yudong [2 ]
Li, Yuexiang [3 ]
Lin, Kevin Qinghong [1 ]
Zheng, Yefeng [3 ]
Shen, Linlin [2 ]
Shou, Mike Zheng [1 ]
机构
[1] Natl Univ Singapore, Show Lab, Singapore, Singapore
[2] Shenzhen Univ, Shenzhen, Peoples R China
[3] Tencent YouTu Lab, Jarvis Res Ctr, Shenzhen, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VISORGPT. By discretizing visual locations, e.g., bounding boxes, human pose, and instance masks, into sequences, VISORGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate the effectiveness of VISORGPT in modeling visual prior and extrapolating to novel scenes, potentially motivating that discrete visual locations can be integrated into the learning paradigm of current language models to further perceive visual world.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] RecGPT: Generative Pre-training for Text-based Recommendation
    Mang Ngo
    Dat Quoc Nguyen
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 302 - 313
  • [22] POINTER: Constrained Progressive Text Generation via Insertion-based Generative Pre-training
    Zhang, Yizhe
    Wang, Guoyin
    Li, Chunyuan
    Gan, Zhe
    Brockett, Chris
    Dolan, Bill
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8649 - 8670
  • [23] Generative Table Pre-training Empowers Models for Tabular Prediction
    Zhang, Tianping
    Wang, Shaowen
    Yale, Shuicheng
    Liu, Jian
    Liu, Qian
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14836 - 14854
  • [24] Self-supervised graph neural network with pre-training generative learning for recommendation systems
    Min, Xin
    Li, Wei
    Yang, Jinzhao
    Xie, Weidong
    Zhao, Dazhe
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [25] Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval
    Tang, Yubao
    Zhang, Ruqing
    Guo, Jiafeng
    de Rijke, Maarten
    Fan, Yixing
    Cheng, Xueqi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 10303 - 10317
  • [26] Self-supervised graph neural network with pre-training generative learning for recommendation systems
    Xin Min
    Wei Li
    Jinzhao Yang
    Weidong Xie
    Dazhe Zhao
    Scientific Reports, 12
  • [27] Pre-training the deep generative models with adaptive hyperparameter optimization
    Yao, Chengwei
    Cai, Deng
    Bu, Jiajun
    Chen, Gencai
    NEUROCOMPUTING, 2017, 247 : 144 - 155
  • [28] Learning Transferable User Representations with Sequential Behaviors via Contrastive Pre-training
    Cheng, Mingyue
    Yuan, Fajie
    Liu, Qi
    Xin, Xin
    Chen, Enhong
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 51 - 60
  • [29] Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-training Transformer
    Liu, Zhiwei
    Fan, Ziwei
    Wang, Yu
    Yu, Philip S.
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1608 - 1612
  • [30] Learning to Sample Replacements for ELECTRA Pre-Training
    Hao, Yaru
    Dong, Li
    Bao, Hangbo
    Xu, Ke
    Wei, Furu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4495 - 4506