Multimedia event extraction based on multimodal low-dimensional feature representation space

被引:0
|
作者
Cui, Yiming [1 ]
Sun, Bin [1 ]
Jiang, Tao [1 ]
Cui, Hongrui [1 ]
机构
[1] Northwest Minzu Univ, Key Lab Language & Cultural Comp, Minist Educ, Natl Languages Informat Technol, Lanzhou 730000, Gansu, Peoples R China
基金
中央高校基本科研业务费专项资金资助;
关键词
Multimedia event extraction; Multimodal representation learning; Contrast learning; Momentum distillation; Image description generation;
D O I
10.1007/s11760-025-03999-8
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In recent years, research on multimedia event extraction has emerged. However, due to the lack of support from large-scale annotated datasets, most of the existing studies rely on weakly supervised methods from different datasets in the training and testing phases, which inevitably leads to event extraction being affected by dataset distribution differences and noise. Meanwhile, although modal fusion can effectively model the correlation and complementarity between different modalities, this process may also introduce more noise, which may affect the extraction results. To address the above problems, we propose a multimedia event extraction method based on multimodal low-dimensional feature representation space (MLDFR), which pays more attention to the handling of noise interference during the multimodal fusion process. On the one hand, MLDFR combines contrast learning and momentum distillation techniques to construct a low-dimensional feature representation space, which enhances the model's ability to match text and images in the representation space, and effectively mitigates the interference of dataset noise on multimodal information fusion. On the other hand, in the visual event extraction process, MLDFR not only fuses the corresponding textual events as additional features, but also generates the corresponding image descriptions through the generative model and integrates them into the extraction process as further complementary features to better model the inter-modal correlations. Several experimental results based on the benchmark dataset show that the proposed MLDFR method can significantly improve the performance of multimedia event extraction.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Low-dimensional feature extraction for humanoid locomotion using kernel dimension reduction
    Morimoto, Jun
    Hyon, Sang-Ho
    Atkeson, Christopher G.
    Cheng, Gordon
    2008 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-9, 2008, : 2711 - +
  • [22] An influence maximization algorithm based on low-dimensional representation learning
    Yuening Liu
    Liqing Qiu
    Chengai Sun
    Applied Intelligence, 2022, 52 : 15865 - 15882
  • [23] A component inspection algorithm based on low-dimensional image feature
    Wu, Jianjie
    Zhang, Yuhui
    THIRD INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2011), 2011, 8009
  • [24] A Low-dimensional Illumination Space Representation of Human Faces for Arbitrary Lighting Conditions
    HU YuanKui WANG ZengFu Department of Automation University of Science and Technology of China Hefei PRChina
    自动化学报, 2007, (01) : 9 - 14
  • [25] A low-dimensional illumination space representation of human faces for arbitrary lighting conditions
    Hu, Yuankui
    Wang, Zengfu
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 1147 - +
  • [26] Low-dimensional illumination space representation of human faces for arbitrary lighting conditions
    Hu, Yuan-Kui
    Wang, Zeng-Fu
    Zidonghua Xuebao/Acta Automatica Sinica, 2007, 33 (01): : 9 - 14
  • [27] A low-dimensional feature vector representation for alignment-free spatial trajectory analysis
    Mobile and Distributed Systems Group, Ludwig-Maximilians University München, Germany
    Proc. ACM SIGSPATIAL Int. Workshop Mob. Geogr. Inf. Syst., MobiGIS, (19-26):
  • [28] Analysis of Multimedia Feature Extraction Technology in College Vocal Performance Teaching Mode Based on Multimodal Multimedia Information
    Nie W.
    Ng W.
    International Journal of Web-Based Learning and Teaching Technologies, 2023, 18 (02)
  • [29] Multimedia event detection with multimodal feature fusion and temporal concept localization
    Oh, Sangmin
    McCloskey, Scott
    Kim, Ilseo
    Vahdat, Arash
    Cannons, Kevin J.
    Hajimirsadeghi, Hossein
    Mori, Greg
    Perera, A. G. Amitha
    Pandey, Megha
    Corso, Jason J.
    MACHINE VISION AND APPLICATIONS, 2014, 25 (01) : 49 - 69
  • [30] Multimedia event detection with multimodal feature fusion and temporal concept localization
    Sangmin Oh
    Scott McCloskey
    Ilseo Kim
    Arash Vahdat
    Kevin J. Cannons
    Hossein Hajimirsadeghi
    Greg Mori
    A. G. Amitha Perera
    Megha Pandey
    Jason J. Corso
    Machine Vision and Applications, 2014, 25 : 49 - 69