Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing

被引：1

作者：

Wang, Yaodong ^{[1
]}

Ji, Zhong ^{[1
]}

Wang, Di ^{[1
]}

Pang, Yanwei ^{[1
]}

Li, Xuelong ^{[2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Northwestern Polytech Univ, Sch Artificial Intelligence, OPt & Elect iOPEN, Xian 710072, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 285卷

基金：

中国国家自然科学基金;

关键词：

referring expression comprehension; unsupervised learning; visual semantic parsing; RECONSTRUCTION;

D O I：

10.1016/j.knosys.2023.111318

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring Expression Comprehension (REC) is a task that involves grounding a specific object in an image based on a given referring query in the form of bounding boxes. Existing fully-supervised or weakly-supervised REC methods rely on either manually annotated regional coordinates or query texts. In this paper, we propose an unsupervised training paradigm for the REC task that does not require any manual annotated data. Specifically, we introduce a <bold>V</bold>isual-Semantic-Parsing-based <bold>U</bold>nsupervised <bold>R</bold>eferring <bold>E</bold>xpression <bold>C</bold>omprehension framework (VUREC), which leverages a Visual Semantic Parser (VSP) as its core module to recognize the rich semantic information in images and construct pseudo-region-query pairs as the training supervision for REC. The VSP comprises a Scene Graph Parser (SGP) and a Visual Concept Detector (VCD) that can detect the locations, categories, attributes of objects, and visual relationships among them in images. Furthermore, we present a Referring Expression Reasoning (RER) model that utilizes a Multi-Modal Cascade Attention Decoder (MCAD) for fine-grained multi-modality fusion and regresses the four coordinates of the referential object directly. The experimental results on three benchmark datasets of Refcoco, Refcoco+ and Refcocog demonstrate the effectiveness of our proposed method.

引用

页数：10

共 50 条

[41] Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing
Heck, Larry
Hakkani-Tur, Dilek
Tur, Gokhan
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1593 - 1597
[42] Semantic separator learning and its applications in unsupervised Chinese text parsing
Yuming Wu
Xiaodong Luo
Zhen Yang
Frontiers of Computer Science, 2013, 7 : 55 - 68
[43] Towards Comparability of Linguistic Graph Banks for Semantic Parsing
Oepen, Stephan
Kuhlmann, Marco
Miyao, Yusuke
Zeman, Daniel
Cinkova, Silvie
Flickinger, Dan
Hajic, Jan
Ivanova, Angelina
Uresova, Zdenka
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3991 - 3995
[44] RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension
Jin, Lei
Luo, Gen
Zhou, Yiyi
Sun, Xiaoshuai
Jiang, Guannan
Shu, Annan
Ji, Rongrong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2681 - 2690
[45] Bottom-Up and Bidirectional Alignment for Referring Expression Comprehension
Li, Liuwu
Bu, Yuqi
Cai, Yi
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5167 - 5175
[46] Attribute-Guided Attention for Referring Expression Generation and Comprehension
Liu, Jingyu
Wang, Wei
Wang, Liang
Yang, Ming-Hsuan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 5244 - 5258
[47] RETR: END-TO-END REFERRING EXPRESSION COMPREHENSION WITH TRANSFORMERS
Rui, Yang
2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
[48] Continual Referring Expression Comprehension via Dual Modular Memorization
Shen, Heng Tao
Chen, Cheng
Wang, Peng
Gao, Lianli
Wang, Meng
Song, Jingkuan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6694 - 6706
[49] Referring expression comprehension model with matching detection and linguistic feedback
Wang, Jianming
Cui, Enjie
Liu, Kunliang
Sun, Yukuan
Liang, Jiayu
Yuan, Chunmiao
Duan, Xiaojie
Jin, Guanghao
Chung, Tae-Sun
IET COMPUTER VISION, 2020, 14 (08) : 625 - 633
[50] Towards Unsupervised Open World Semantic Segmentation
Uhlemeyer, Svenja
Rottmann, Matthias
Gottschalk, Hanno
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 1981 - 1991

← 1 2 3 4 5 →