GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval

被引：0

作者：

Wang, Yuting ^{[1
,3
]}

Wang, Jinpeng ^{[1
,3
]}

Chen, Bin ^{[2
,3
]}

Zeng, Ziyun ^{[1
,3
]}

Xia, Shu-Tao ^{[1
,3
]}

机构：

[1] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China

[2] Harbin Inst Technol, Shenzhen, Peoples R China

[3] Peng Cheng Lab, Res Ctr Artificial Intelligence, Shenzhen, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Given a text query, partially relevant video retrieval (PRVR) seeks to find untrimmed videos containing pertinent moments in a database. For PRVR, clip modeling is essential to capture the partial relationship between texts and videos. Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead. To solve the efficiency problem of PRVR methods, this paper proposes GMMFormer, a Gaussian-Mixture-Model based Transformer which models clip representations implicitly. During frame interactions, we incorporate Gaussian-Mixture-Model constraints to focus each frame on its adjacent frames instead of the whole video. Then generated representations will contain multi-scale clip information, achieving implicit clip modeling. In addition, PRVR methods ignore semantic differences between text queries relevant to the same video, leading to a sparse embedding space. We propose a query diverse loss to distinguish these text queries, making the embedding space more intensive and contain more semantic information. Extensive experiments on three large-scale video datasets (i.e., TVR, ActivityNet Captions, and Charades-STA) demonstrate the superiority and efficiency of GMMFormer. Code is available at https://github.com/huangmozhi9527/GMMFormer.

引用

页码：5767 / 5775

页数：9

共 50 条

[1] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
Wang, Yuting
Wang, Jinpeng
Chen, Bin
Zeng, Ziyun
Xia, Shu-Tao
arXiv, 2023,
[2] Partially Relevant Video Retrieval
Dong, Jianfeng
Chen, Xianke
Zhang, Minsong
Yang, Xun
Chen, Shujie
Li, Xirong
Wang, Xun
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
[3] Gaussian-Mixture-Model Based Clutter Suppression in Perceptive Mobile Networks
Rahman, Md Lushanur
Zhang, J. Andrew
Huang, Xiaojing
Guo, Y. Jay
Lu, Zhiping
IEEE COMMUNICATIONS LETTERS, 2021, 25 (01) : 152 - 156
[4] Unsupervised Emotional Scene Detection for Lifelog Video Retrieval Based on Gaussian Mixture Model
Nomiya, Hiroki
Morikuni, Atsushi
Hochin, Teruhisa
17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 375 - 384
[5] Efficient Video Object Segmentation Based on Gaussian Mixture Model and Markov Random Field
Liu, Zhi
Gu, Jiandong
Shen, Liquan
Zhang, Zhaoyang
ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 1006 - 1009
[6] Video object segmentation based on Gaussian mixture model
School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
Hsi An Chiao Tung Ta Hsueh, 2006, 6 (724-728):
[7] Video Segmentation Based on the Gaussian Mixture Updating Model
Geng, Jie
Miao, Zhenjiang
Liang, Qinghua
Wang, Shu
Wu, Hao
2015 8TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), 2015, : 52 - 56
[8] Effective Color Image Retrieval Based on the Gaussian Mixture Model
Luszczkiewicz-Piatek, Maria
Smolka, Bogdan
COMPUTATIONAL COLOR IMAGING, 2011, 6626 : 199 - +
[9] Image Similarity in Gaussian Mixture Model Based Image Retrieval
Luszczkiewicz-Piatek, Maria
IMAGE PROCESSING AND COMMUNICATIONS CHALLENGES 8, 2017, 525 : 87 - 95
[10] GAUSSIAN MIXTURE MODEL BASED APPROACH TO COLOR IMAGE RETRIEVAL
Luszkiewicz, Maria
Smolka, Bogdan
PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 527 - +

← 1 2 3 4 5 →