Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-directional Text-Image Matching

被引：4

作者：

Yu, Xiaojing ^{[1
]}

Chen, Tianlong ^{[1
]}

Yang, Yang ^{[2
]}

Mugo, Michael ^{[2
]}

Wang, Zhangyang ^{[1
]}

机构：

[1] Texas A&M Univ, College Stn, TX 77843 USA

[2] Walmart Technol, New York, NY USA

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW) | 2019年

关键词：

NETWORK;

D O I：

10.1109/ICCVW.2019.00223

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Searching person images from a gallery based on natural language descriptions remains to be a challenging and under-explored cross-modal retrieval problem. To improve the accuracy off an image-based retrieval task, e.g., person re-identification (Person Re-Id), re-ranking is known to be an effective post-processing tool. In this paper, we extend re-ranking from uni-modal retrieval to cross-modal retrieval for the first time, and develop a bi-directional coarse-to-fine framework (BCF) for cross-modal person search. Built on a recent state-of-the-art Person ReId model [5], BCF exploits first text-to-image and then image-to-text relevance, in a two-stage refinement fashion. BCF ranks competitively against a strong baseline[24] on the newly-introduced WIDER Person Search dataset [1], boosting validation set performance by 9.01%(top1)/3.87%(mAP) for val1 and 6.60%(top-1)/3.49%(mAP) for val2 , respectively. With a high score, our solution ranks competitively in the ICCV 2019 WIDER Person Search by Language Challenge.

引用

页码：1799 / 1804

页数：6

共 50 条

[1] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
Hu, Yupeng
Nie, Liqiang
Liu, Meng
Wang, Kun
Wang, Yinglong
Hua, Xian-Sheng
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943
[2] Cross-modal feature learning and alignment network for text-image person re-identification
Huang, Bailiang
Qi, Xiaolong
Chen, Bin
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
[3] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
Yu, Hongfeng
Yao, Fanglong
Lu, Wanxuan
Liu, Nayu
Li, Peiguang
You, Hongjian
Sun, Xian
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
[4] Improving text-image cross-modal retrieval with contrastive loss
Chumeng Zhang
Yue Yang
Junbo Guo
Guoqing Jin
Dan Song
An An Liu
Multimedia Systems, 2023, 29 : 569 - 575
[5] Improving text-image cross-modal retrieval with contrastive loss
Zhang, Chumeng
Yang, Yue
Guo, Junbo
Jin, Guoqing
Song, Dan
Liu, An An
MULTIMEDIA SYSTEMS, 2023, 29 (02) : 569 - 575
[6] Cross-modal semantic aligning and neighbor-aware completing for robust text-image person retrieval
Gong, Tiantian
Wang, Junsheng
Zhang, Liyan
INFORMATION FUSION, 2024, 112
[7] CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Wang, Zihao
Liu, Xihui
Li, Hongsheng
Sheng, Lu
Yan, Junjie
Wang, Xiaogang
Shao, Jing
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5763 - 5772
[8] A Coarse-to-Fine Text Matching Framework for Customer Service Question Answering
Li, Ang
Liang, Xingwei
Zhang, Miao
Wang, Bingbing
Chen, Guanrong
Gao, Jun
Lin, Qihui
Xu, Ruifeng
COGNITIVE COMPUTING, ICCC 2022, 2022, 13734 : 39 - 53
[9] Cross-Modal Dual Matching and Comparison for Text-to-Image Person Re-identification
Cao, Lin
Sun, Wenwen
Guo, Yanan
Wang, Shoujing
Lv, Boqian
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 246 - 259
[10] Unified Text-Image Space Alignment with Cross-Modal Prompting in CLIP for UDA
Jiao, Yifan
Cai, Chenglong
Bao, Bing-Kun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)

← 1 2 3 4 5 →