GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval

被引:0
|
作者
Wang, Yuting [1 ,3 ]
Wang, Jinpeng [1 ,3 ]
Chen, Bin [2 ,3 ]
Zeng, Ziyun [1 ,3 ]
Xia, Shu-Tao [1 ,3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Harbin Inst Technol, Shenzhen, Peoples R China
[3] Peng Cheng Lab, Res Ctr Artificial Intelligence, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos containing pertinent moments in a database. For PRVR, clip modeling is essential to capture the partial relationship between texts and videos. Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead. To solve the efficiency problem of PRVR methods, this paper proposes GMMFormer, a Gaussian-Mixture-Model based Transformer which models clip representations implicitly. During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. Then generated representations will contain multi-scale clip information, achieving implicit clip modeling. In addition, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intensive and contain more semantic information. Extensive experiments on three large-scale video datasets (i.e., TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer. Code is available at https://github.com/huangmozhi9527/GMMFormer.
引用
收藏
页码:5767 / 5775
页数:9
相关论文
共 50 条
  • [1] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
    Wang, Yuting
    Wang, Jinpeng
    Chen, Bin
    Zeng, Ziyun
    Xia, Shu-Tao
    arXiv, 2023,
  • [2] Partially Relevant Video Retrieval
    Dong, Jianfeng
    Chen, Xianke
    Zhang, Minsong
    Yang, Xun
    Chen, Shujie
    Li, Xirong
    Wang, Xun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [3] Gaussian-Mixture-Model Based Clutter Suppression in Perceptive Mobile Networks
    Rahman, Md Lushanur
    Zhang, J. Andrew
    Huang, Xiaojing
    Guo, Y. Jay
    Lu, Zhiping
    IEEE COMMUNICATIONS LETTERS, 2021, 25 (01) : 152 - 156
  • [4] Unsupervised Emotional Scene Detection for Lifelog Video Retrieval Based on Gaussian Mixture Model
    Nomiya, Hiroki
    Morikuni, Atsushi
    Hochin, Teruhisa
    17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 375 - 384
  • [5] Efficient Video Object Segmentation Based on Gaussian Mixture Model and Markov Random Field
    Liu, Zhi
    Gu, Jiandong
    Shen, Liquan
    Zhang, Zhaoyang
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 1006 - 1009
  • [6] Video object segmentation based on Gaussian mixture model
    School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
    Hsi An Chiao Tung Ta Hsueh, 2006, 6 (724-728):
  • [7] Video Segmentation Based on the Gaussian Mixture Updating Model
    Geng, Jie
    Miao, Zhenjiang
    Liang, Qinghua
    Wang, Shu
    Wu, Hao
    2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2015, : 52 - 56
  • [8] Effective Color Image Retrieval Based on the Gaussian Mixture Model
    Luszczkiewicz-Piatek, Maria
    Smolka, Bogdan
    COMPUTATIONAL COLOR IMAGING, 2011, 6626 : 199 - +
  • [9] Image Similarity in Gaussian Mixture Model Based Image Retrieval
    Luszczkiewicz-Piatek, Maria
    IMAGE PROCESSING AND COMMUNICATIONS CHALLENGES 8, 2017, 525 : 87 - 95
  • [10] GAUSSIAN MIXTURE MODEL BASED APPROACH TO COLOR IMAGE RETRIEVAL
    Luszkiewicz, Maria
    Smolka, Bogdan
    PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 527 - +