Multimedia event extraction based on multimodal low-dimensional feature representation space

被引：0

作者：

Cui, Yiming ^{[1
]}

Sun, Bin ^{[1
]}

Jiang, Tao ^{[1
]}

Cui, Hongrui ^{[1
]}

机构：

[1] Northwest Minzu Univ, Key Lab Language & Cultural Comp, Minist Educ, Natl Languages Informat Technol, Lanzhou 730000, Gansu, Peoples R China

来源：

SIGNAL IMAGE AND VIDEO PROCESSING | 2025年 / 19卷 / 05期

基金：

中央高校基本科研业务费专项资金资助;

关键词：

Multimedia event extraction; Multimodal representation learning; Contrast learning; Momentum distillation; Image description generation;

D O I：

10.1007/s11760-025-03999-8

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In recent years, research on multimedia event extraction has emerged. However, due to the lack of support from large-scale annotated datasets, most of the existing studies rely on weakly supervised methods from different datasets in the training and testing phases, which inevitably leads to event extraction being affected by dataset distribution differences and noise. Meanwhile, although modal fusion can effectively model the correlation and complementarity between different modalities, this process may also introduce more noise, which may affect the extraction results. To address the above problems, we propose a multimedia event extraction method based on multimodal low-dimensional feature representation space (MLDFR), which pays more attention to the handling of noise interference during the multimodal fusion process. On the one hand, MLDFR combines contrast learning and momentum distillation techniques to construct a low-dimensional feature representation space, which enhances the model's ability to match text and images in the representation space, and effectively mitigates the interference of dataset noise on multimodal information fusion. On the other hand, in the visual event extraction process, MLDFR not only fuses the corresponding textual events as additional features, but also generates the corresponding image descriptions through the generative model and integrates them into the extraction process as further complementary features to better model the inter-modal correlations. Several experimental results based on the benchmark dataset show that the proposed MLDFR method can significantly improve the performance of multimedia event extraction.

引用

页数：15

共 50 条

[1] Low-Dimensional Sensory Feature Representation by Trigeminal Primary Afferents
Bale, Michael R.
Davies, Kyle
Freeman, Oliver J.
Ince, Robin A. A.
Petersen, Rasmus S.
JOURNAL OF NEUROSCIENCE, 2013, 33 (29): : 12003 - 12012
[2] Facial-expression recognition based on a low-dimensional temporal feature space
Ben Abdallah, Taoufik
Guermazi, Radhouane
Hammami, Mohamed
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (15) : 19455 - 19479
[3] Facial-expression recognition based on a low-dimensional temporal feature space
Taoufik Ben Abdallah
Radhouane Guermazi
Mohamed Hammami
Multimedia Tools and Applications, 2018, 77 : 19455 - 19479
[4] FEATURE-EXTRACTION OF POLYSACCHARIDES BY LOW-DIMENSIONAL INTERNAL REPRESENTATION NEURAL NETWORKS AND INFRARED-SPECTROSCOPY
JACOBSSON, SP
ANALYTICA CHIMICA ACTA, 1994, 291 (1-2) : 19 - 27
[5] LOW-DIMENSIONAL REPRESENTATION OF FACES IN HIGHER DIMENSIONS OF THE FACE SPACE
OTOOLE, AJ
ABDI, H
DEFFENBACHER, KA
VALENTIN, D
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1993, 10 (03): : 405 - 411
[6] Low-dimensional representation of faces in higher dimensions of the face space
O'Toole, A.J., 1600, (10):
[7] Transformation of measurement uncertainties into low-dimensional feature vector space
Alexiadis, A.
Ferson, S.
Patterson, E. A.
ROYAL SOCIETY OPEN SCIENCE, 2021, 8 (03):
[8] Human Actions Modelling and Recognition in Low-dimensional Feature Space
Hachaj, Tomasz
Ogiela, Marek R.
Koptyra, Katarzyna
2015 10TH INTERNATIONAL CONFERENCE ON BROADBAND AND WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2015), 2015, : 247 - 254
[9] Low-dimensional topology, low-dimensional field theory and representation theory
Fuchs, Juergen
Schweigert, Christoph
REPRESENTATION THEORY - CURRENT TRENDS AND PERSPECTIVES, 2017, : 255 - 267
[10] A Robust Tracking with Low-Dimensional Target-Specific Feature Extraction
Jiang, Chengcheng
Zhu, Xinyu
Li, Chao
Chen, Gengsheng
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (07) : 1349 - 1361

← 1 2 3 4 5 →