Towards Unsupervised Referring Expression Comprehension with Visual Semantic Parsing

被引：1

作者：

Wang, Yaodong ^{[1
]}

Ji, Zhong ^{[1
]}

Wang, Di ^{[1
]}

Pang, Yanwei ^{[1
]}

Li, Xuelong ^{[2
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

[2] Northwestern Polytech Univ, Sch Artificial Intelligence, OPt & Elect iOPEN, Xian 710072, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 285卷

基金：

中国国家自然科学基金;

关键词：

referring expression comprehension; unsupervised learning; visual semantic parsing; RECONSTRUCTION;

D O I：

10.1016/j.knosys.2023.111318

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring Expression Comprehension (REC) is a task that involves grounding a specific object in an image based on a given referring query in the form of bounding boxes. Existing fully-supervised or weakly-supervised REC methods rely on either manually annotated regional coordinates or query texts. In this paper, we propose an unsupervised training paradigm for the REC task that does not require any manual annotated data. Specifically, we introduce a <bold>V</bold>isual-Semantic-Parsing-based <bold>U</bold>nsupervised <bold>R</bold>eferring <bold>E</bold>xpression <bold>C</bold>omprehension framework (VUREC), which leverages a Visual Semantic Parser (VSP) as its core module to recognize the rich semantic information in images and construct pseudo-region-query pairs as the training supervision for REC. The VSP comprises a Scene Graph Parser (SGP) and a Visual Concept Detector (VCD) that can detect the locations, categories, attributes of objects, and visual relationships among them in images. Furthermore, we present a Referring Expression Reasoning (RER) model that utilizes a Multi-Modal Cascade Attention Decoder (MCAD) for fine-grained multi-modality fusion and regresses the four coordinates of the referential object directly. The experimental results on three benchmark datasets of Refcoco, Refcoco+ and Refcocog demonstrate the effectiveness of our proposed method.

引用

页数：10

共 50 条

[21] ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Sul, Wei
Miao, Peihan
Doul, Huanzhang
Li, Xi
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13449 - 13458
[22] Referring Expression Generation and Comprehension via Attributes
Liu, Jingyu
Wang, Liang
Yang, Ming-Hsuan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4866 - 4874
[23] Revisiting Counterfactual Problems in Referring Expression Comprehension
Yu, Zhihan
Li, Ruifan
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13438 - 13448
[24] Correspondence Matters for Video Referring Expression Comprehension
Cao, Meng
Jiang, Ji
Chen, Long
Zou, Yuexian
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4967 - 4976
[25] Inexactly Matched Referring Expression Comprehension With Rationale
Li, Xiaochuan
Fan, Baoyu
Zhang, Runze
Zhao, Kun
Guo, Zhenhua
Zhao, Yaqian
Li, Rengang
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3937 - 3950
[26] Referring Expression Comprehension: A Survey of Methods and Datasets
Qiao, Yanyuan
Deng, Chaorui
Wu, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4426 - 4440
[27] Relationship Aggregation Network for Referring Expression Comprehension
Guo W.
Zhang Y.
Liu S.
Yang J.
Yuan X.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (11): : 2611 - 2623
[28] Progressively diffused networks for semantic visual parsing
Zhang, Ruimao
Yang, Wei
Peng, Zhanglin
Wei, Pengxu
Wang, Xiaogang
Lin, Liang
PATTERN RECOGNITION, 2019, 90 : 78 - 86
[29] Cross-Modal Recurrent Semantic Comprehension for Referring Image Segmentation
Shang, Chao
Li, Hongliang
Qiu, Heqian
Wu, Qingbo
Meng, Fanman
Zhao, Taijin
Ngan, King Ngi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (07) : 3229 - 3242
[30] Multi-level attention for referring expression comprehension
Sun, Yanfeng
Zhang, Yunru
Jiang, Huajie
Hu, Yongli
Yin, Baocai
PATTERN RECOGNITION LETTERS, 2023, 172 : 252 - 258

← 1 2 3 4 5 →