Cross-Modal Person Search: A Coarse-to-Fine Framework using Bi-directional Text-Image Matching

被引:4
|
作者
Yu, Xiaojing [1 ]
Chen, Tianlong [1 ]
Yang, Yang [2 ]
Mugo, Michael [2 ]
Wang, Zhangyang [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
[2] Walmart Technol, New York, NY USA
关键词
NETWORK;
D O I
10.1109/ICCVW.2019.00223
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Searching person images from a gallery based on natural language descriptions remains to be a challenging and under-explored cross-modal retrieval problem. To improve the accuracy off an image-based retrieval task, e.g., person re-identification (Person Re-Id), re-ranking is known to be an effective post-processing tool. In this paper, we extend re-ranking from uni-modal retrieval to cross-modal retrieval for the first time, and develop a bi-directional coarse-to-fine framework (BCF) for cross-modal person search. Built on a recent state-of-the-art Person ReId model [5], BCF exploits first text-to-image and then image-to-text relevance, in a two-stage refinement fashion. BCF ranks competitively against a strong baseline[24] on the newly-introduced WIDER Person Search dataset [1], boosting validation set performance by 9.01%(top1)/3.87%(mAP) for val1 and 6.60%(top-1)/3.49%(mAP) for val2 , respectively. With a high score, our solution ranks competitively in the ICCV 2019 WIDER Person Search by Language Challenge.
引用
收藏
页码:1799 / 1804
页数:6
相关论文
共 50 条
  • [21] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [22] Cross-modal Semantic Interference Suppression for image-text matching
    Yao, Tao
    Peng, Shouyong
    Sun, Yujuan
    Sheng, Guorui
    Fu, Haiyan
    Kong, Xiangwei
    Engineering Applications of Artificial Intelligence, 2024, 133
  • [23] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [24] MACA: Memory-aided Coarse-to-fine Alignment for Text-based Person Search
    Su, Liangxu
    Quan, Rong
    Qi, Zhiyuan
    Qin, Jie
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2497 - 2501
  • [25] Cross-modal alignment with synthetic caption for text-based person search
    Weichen Zhao
    Yuxing Lu
    Zhiyuan Liu
    Yuan Yang
    Ge Jiao
    International Journal of Multimedia Information Retrieval, 2025, 14 (2)
  • [26] Text-based person search via cross-modal alignment learning
    Ke, Xiao
    Liu, Hao
    Xu, Peirong
    Lin, Xinru
    Guo, Wenzhong
    PATTERN RECOGNITION, 2024, 152
  • [27] Learning Text-image Joint Embedding for Efficient Cross-modal Retrieval with Deep Feature Engineering
    Xie, Zhongwei
    Liu, Ling
    Wu, Yanzhao
    Zhong, Luo
    Li, Lin
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2022, 40 (04)
  • [28] A bi-directional attention guided cross-modal network for music based dance generation
    Fan, Di
    Wan, Lili
    Xu, Wanru
    Wang, Shenghui
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 103
  • [29] Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval
    Lin, Dixuan
    Peng, Yi-Xing
    Meng, Jingke
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6609 - 6620
  • [30] Coarse-to-fine dual-level attention for video-text cross modal retrieval
    Jin, Ming
    Zhang, Huaxiang
    Zhu, Lei
    Sun, Jiande
    Liu, Li
    KNOWLEDGE-BASED SYSTEMS, 2022, 242