Fine-Grained Video Retrieval With Scene Sketches

被引:2
|
作者
Zuo, Ran [1 ,2 ]
Deng, Xiaoming [1 ,2 ]
Chen, Keqi [1 ,2 ]
Zhang, Zhengming [1 ,2 ]
Lai, Yu-Kun [3 ]
Liu, Fang [4 ]
Ma, Cuixia [1 ,2 ]
Wang, Hao [5 ]
Liu, Yong-Jin [4 ]
Wang, Hongan [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Software, Beijing Key Lab Human Comp Interact, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Dept Comp Sci & Technol, Beijing 101408, Peoples R China
[3] Cardiff Univ, Dept Comp Sci & Informat, Cardiff CF24 4AG, Wales
[4] Tsinghua Univ, BNRist, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[5] Alibaba, Beijing 100102, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Semantics; Visualization; Convolutional neural networks; Layout; Image coding; Encoding; Fine-grained sketch-based video retrieval; sketch-video dataset; scene sketch; graph convolutional networks;
D O I
10.1109/TIP.2023.3278474
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Benefiting from the intuitiveness and naturalness of sketch interaction, sketch-based video retrieval (SBVR) has received considerable attention in the video retrieval research area. However, most existing SBVR research still lacks the capability of accurate video retrieval with fine-grained scene content. To address this problem, in this paper we investigate a new task, which focuses on retrieving the target video by utilizing a fine-grained storyboard sketch depicting the scene layout and major foreground instances' visual characteristics (e.g., appearance, size, pose, etc.) of video; we call such a task "fine-grained scene-level SBVR". The most challenging issue in this task is how to perform scene-level cross-modal alignment between sketch and video. Our solution consists of two parts. First, we construct a scene-level sketch-video dataset called SketchVideo, in which sketch-video pairs are provided and each pair contains a clip-level storyboard sketch and several keyframe sketches (corresponding to video frames). Second, we propose a novel deep learning architecture called Sketch Query Graph Convolutional Network (SQ-GCN). In SQ-GCN, we first adaptively sample the video frames to improve video encoding efficiency, and then construct appearance and category graphs to jointly model visual and semantic alignment between sketch and video. Experiments show that our fine-grained scene-level SBVR framework with SQ-GCN architecture outperforms the state-of-the-art fine-grained retrieval methods. The SketchVideo dataset and SQ-GCN code are available in the project webpage https://iscas-mmsketch.github.io/FG-SL-SBVR/.
引用
收藏
页码:3136 / 3149
页数:14
相关论文
共 50 条
  • [1] FIVR: Fine-Grained Incident Video Retrieval
    Kordopatis-Zilos, Giorgos
    Papadopoulos, Symeon
    Patras, Ioannis
    Kompatsiaris, Ioannis
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (10) : 2638 - 2652
  • [2] Fine-Grained Retrieval with Autoencoders
    Portenier, Tiziano
    Hu, Qiyang
    Favaro, Paolo
    Zwicker, Matthias
    PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2018), VOL 5: VISAPP, 2018, : 85 - 95
  • [3] Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning
    Chen, Shizhe
    Zhao, Yida
    Jin, Qin
    Wu, Qi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10635 - 10644
  • [4] Lifelong Fine-Grained Image Retrieval
    Chen, Wei
    Xu, Haoyang
    Pu, Nan
    Liu, Yu
    Lao, Mingrui
    Wang, Weiping
    Liu, Li
    Lew, Michael S.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7533 - 7544
  • [5] Fine-Grained Retrieval Prompt Tuning
    Wang, Shijie
    Chang, Jianlong
    Wang, Zhihui
    Li, Haojie
    Ouyang, Wanli
    Tian, Qi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2644 - 2652
  • [6] VIDEO-MUSIC RETRIEVAL WITH FINE-GRAINED CROSS-MODAL ALIGNMENT
    Era, Yuki
    Togo, Ren
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2005 - 2009
  • [7] Fine-Grained Instance-Level Sketch-Based Video Retrieval
    Xu, Peng
    Liu, Kun
    Xiang, Tao
    Hospedales, Timothy M.
    Ma, Zhanyu
    Guo, Jun
    Song, Yi-Zhe
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1995 - 2007
  • [8] FeatInter: Exploring fine-grained object features for video-text retrieval
    Liu, Baolong
    Zheng, Qi
    Wang, Yabing
    Zhang, Minsong
    Dong, Jianfeng
    Wang, Xun
    NEUROCOMPUTING, 2022, 496 : 178 - 191
  • [9] Fine-grained Audible Video Description
    Shen, Xuyang
    Li, Dong
    Zhou, Jinxing
    Qin, Zhen
    He, Bowen
    Han, Xiaodong
    Li, Aixuan
    Dai, Yuchao
    Kong, Lingpeng
    Wang, Meng
    Qiao, Yu
    Zhong, Yiran
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10585 - 10596
  • [10] Fine-Grained Scalable Video Caching
    Gong, Qiushi
    Woods, John W.
    Kar, Koushik
    Chakareski, Jacob
    2015 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2015, : 101 - 106