Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-directional Text-Image Matching

被引:4
|
作者
Yu, Xiaojing [1 ]
Chen, Tianlong [1 ]
Yang, Yang [2 ]
Mugo, Michael [2 ]
Wang, Zhangyang [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
[2] Walmart Technol, New York, NY USA
关键词
NETWORK;
D O I
10.1109/ICCVW.2019.00223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Searching person images from a gallery based on natural language descriptions remains to be a challenging and under-explored cross-modal retrieval problem. To improve the accuracy off an image-based retrieval task, e.g., person re-identification (Person Re-Id), re-ranking is known to be an effective post-processing tool. In this paper, we extend re-ranking from uni-modal retrieval to cross-modal retrieval for the first time, and develop a bi-directional coarse-to-fine framework (BCF) for cross-modal person search. Built on a recent state-of-the-art Person ReId model [5], BCF exploits first text-to-image and then image-to-text relevance, in a two-stage refinement fashion. BCF ranks competitively against a strong baseline[24] on the newly-introduced WIDER Person Search dataset [1], boosting validation set performance by 9.01%(top1)/3.87%(mAP) for val1 and 6.60%(top-1)/3.49%(mAP) for val2 , respectively. With a high score, our solution ranks competitively in the ICCV 2019 WIDER Person Search by Language Challenge.
引用
收藏
页码:1799 / 1804
页数:6
相关论文
共 50 条
  • [1] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
    Hu, Yupeng
    Nie, Liqiang
    Liu, Meng
    Wang, Kun
    Wang, Yinglong
    Hua, Xian-Sheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943
  • [2] Cross-modal feature learning and alignment network for text-image person re-identification
    Huang, Bailiang
    Qi, Xiaolong
    Chen, Bin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
  • [3] Text-Image Matching for Cross-Modal Remote Sensing Image Retrieval via Graph Neural Network
    Yu, Hongfeng
    Yao, Fanglong
    Lu, Wanxuan
    Liu, Nayu
    Li, Peiguang
    You, Hongjian
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 812 - 824
  • [4] Improving text-image cross-modal retrieval with contrastive loss
    Chumeng Zhang
    Yue Yang
    Junbo Guo
    Guoqing Jin
    Dan Song
    An An Liu
    Multimedia Systems, 2023, 29 : 569 - 575
  • [5] Improving text-image cross-modal retrieval with contrastive loss
    Zhang, Chumeng
    Yang, Yue
    Guo, Junbo
    Jin, Guoqing
    Song, Dan
    Liu, An An
    MULTIMEDIA SYSTEMS, 2023, 29 (02) : 569 - 575
  • [6] Cross-modal semantic aligning and neighbor-aware completing for robust text-image person retrieval
    Gong, Tiantian
    Wang, Junsheng
    Zhang, Liyan
    INFORMATION FUSION, 2024, 112
  • [7] CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
    Wang, Zihao
    Liu, Xihui
    Li, Hongsheng
    Sheng, Lu
    Yan, Junjie
    Wang, Xiaogang
    Shao, Jing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5763 - 5772
  • [8] A Coarse-to-Fine Text Matching Framework for Customer Service Question Answering
    Li, Ang
    Liang, Xingwei
    Zhang, Miao
    Wang, Bingbing
    Chen, Guanrong
    Gao, Jun
    Lin, Qihui
    Xu, Ruifeng
    COGNITIVE COMPUTING, ICCC 2022, 2022, 13734 : 39 - 53
  • [9] Cross-Modal Dual Matching and Comparison for Text-to-Image Person Re-identification
    Cao, Lin
    Sun, Wenwen
    Guo, Yanan
    Wang, Shoujing
    Lv, Boqian
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 246 - 259
  • [10] Unified Text-Image Space Alignment with Cross-Modal Prompting in CLIP for UDA
    Jiao, Yifan
    Cai, Chenglong
    Bao, Bing-Kun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2025, 21 (03)