GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval

被引:0
|
作者
Wang, Yuting [1 ,3 ]
Wang, Jinpeng [1 ,3 ]
Chen, Bin [2 ,3 ]
Zeng, Ziyun [1 ,3 ]
Xia, Shu-Tao [1 ,3 ]
机构
[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[2] Harbin Inst Technol, Shenzhen, Peoples R China
[3] Peng Cheng Lab, Res Ctr Artificial Intelligence, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos containing pertinent moments in a database. For PRVR, clip modeling is essential to capture the partial relationship between texts and videos. Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead. To solve the efficiency problem of PRVR methods, this paper proposes GMMFormer, a Gaussian-Mixture-Model based Transformer which models clip representations implicitly. During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. Then generated representations will contain multi-scale clip information, achieving implicit clip modeling. In addition, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intensive and contain more semantic information. Extensive experiments on three large-scale video datasets (i.e., TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer. Code is available at https://github.com/huangmozhi9527/GMMFormer.
引用
收藏
页码:5767 / 5775
页数:9
相关论文
共 50 条
  • [31] Gaussian mixture model based phase prior learning for video motion estimation
    Cai, Enjian
    Zhang, Yi
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 175
  • [32] Gaussian Mixture Model Based Player Tracking Technique in Basketball Sports Video
    Jia, Xin-Hui
    Evans, Cawlton
    Journal of Network Intelligence, 2024, 9 (02): : 1210 - 1227
  • [33] CRET: Cross-Modal Retrieval Transformer for Efficient Text-Video Retrieval
    Ji, Kaixiang
    Liu, Jiajia
    Hong, Weixiang
    Zhong, Liheng
    Wang, Jian
    Chen, Jingdong
    Chu, Wei
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 949 - 959
  • [34] Improved Gaussian mixture model in video motion detection
    Xie, Yong
    Journal of Multimedia, 2013, 8 (05): : 527 - 533
  • [35] REPAIRS: Gaussian Mixture Model-based Completion and Optimization of Partially Specified Systems
    Terway, Prerit
    Jha, Niraj K.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (04)
  • [36] Application of laplacian mixture model to image and video retrieval
    Amin, Tahir
    Zeytinoglu, Mehmet
    Guan, Ling
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (07) : 1416 - 1429
  • [37] Content-based image retrieval using a Gaussian mixture model in the wavelet domain
    Yuan, H
    Zhang, XP
    Guan, L
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3, 2003, 5150 : 422 - 429
  • [38] Transferable dual multi-granularity semantic excavating for partially relevant video retrieval
    Cheng, Dingxin
    Kong, Shuhan
    Jiang, Bin
    Guo, Qiang
    IMAGE AND VISION COMPUTING, 2024, 149
  • [39] LOW COMPLEXITY ON-LINE VIDEO SUMMARIZATION WITH GAUSSIAN MIXTURE MODEL BASED CLUSTERING
    Ou, Shun-Hsing
    Lee, Chia-Han
    Somayazulu, V. Srinivasa
    Chen, Yen-Kuang
    Chien, Shao-Yi
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] Video Surveillance System Based on Gaussian mixture model for moving object detection method
    Xu Huahu
    Gaojue
    Yang Chenhai
    He Xiang
    2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 3, 2011, : 354 - 357