Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-directional Text-Image Matching

被引：4

作者：

Yu, Xiaojing ^{[1
]}

Chen, Tianlong ^{[1
]}

Yang, Yang ^{[2
]}

Mugo, Michael ^{[2
]}

Wang, Zhangyang ^{[1
]}

机构：

[1] Texas A&M Univ, College Stn, TX 77843 USA

[2] Walmart Technol, New York, NY USA

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW) | 2019年

关键词：

NETWORK;

D O I：

10.1109/ICCVW.2019.00223

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Searching person images from a gallery based on natural language descriptions remains to be a challenging and under-explored cross-modal retrieval problem. To improve the accuracy off an image-based retrieval task, e.g., person re-identification (Person Re-Id), re-ranking is known to be an effective post-processing tool. In this paper, we extend re-ranking from uni-modal retrieval to cross-modal retrieval for the first time, and develop a bi-directional coarse-to-fine framework (BCF) for cross-modal person search. Built on a recent state-of-the-art Person ReId model [5], BCF exploits first text-to-image and then image-to-text relevance, in a two-stage refinement fashion. BCF ranks competitively against a strong baseline[24] on the newly-introduced WIDER Person Search dataset [1], boosting validation set performance by 9.01%(top1)/3.87%(mAP) for val1 and 6.60%(top-1)/3.49%(mAP) for val2 , respectively. With a high score, our solution ranks competitively in the ICCV 2019 WIDER Person Search by Language Challenge.

引用

页码：1799 / 1804

页数：6

共 50 条

[21] Cross-modal Semantic Interference Suppression for image-text matching
Yao, Tao
Peng, Shouyong
Sun, Yujuan
Sheng, Guorui
Fu, Haiyan
Kong, Xiangwei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[22] Cross-modal Semantic Interference Suppression for image-text matching
Yao, Tao
Peng, Shouyong
Sun, Yujuan
Sheng, Guorui
Fu, Haiyan
Kong, Xiangwei
Engineering Applications of Artificial Intelligence, 2024, 133
[23] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[24] MACA: Memory-aided Coarse-to-fine Alignment for Text-based Person Search
Su, Liangxu
Quan, Rong
Qi, Zhiyuan
Qin, Jie
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2497 - 2501
[25] Cross-modal alignment with synthetic caption for text-based person search
Weichen Zhao
Yuxing Lu
Zhiyuan Liu
Yuan Yang
Ge Jiao
International Journal of Multimedia Information Retrieval, 2025, 14 (2)
[26] Text-based person search via cross-modal alignment learning
Ke, Xiao
Liu, Hao
Xu, Peirong
Lin, Xinru
Guo, Wenzhong
PATTERN RECOGNITION, 2024, 152
[27] Learning Text-image Joint Embedding for Efficient Cross-modal Retrieval with Deep Feature Engineering
Xie, Zhongwei
Liu, Ling
Wu, Yanzhao
Zhong, Luo
Li, Lin
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2022, 40 (04)
[28] A bi-directional attention guided cross-modal network for music based dance generation
Fan, Di
Wan, Lili
Xu, Wanru
Wang, Shenghui
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 103
[29] Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval
Lin, Dixuan
Peng, Yi-Xing
Meng, Jingke
Zheng, Wei-Shi
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6609 - 6620
[30] Coarse-to-fine dual-level attention for video-text cross modal retrieval
Jin, Ming
Zhang, Huaxiang
Zhu, Lei
Sun, Jiande
Liu, Li
KNOWLEDGE-BASED SYSTEMS, 2022, 242

← 1 2 3 4 5 →