A Multimodal Framework for Unsupervised Feature Fusion

被引:0
|
作者
Li, Xiaoyi [1 ]
Gao, Jing [1 ]
Li, Hui [1 ]
Yang, Le [1 ]
Srihari, Rohini K. [1 ]
机构
[1] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
Multimodal Framework; Feature Fusion; Restricted Boltzmann Machine;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the overwhelming amounts of visual contents on the Internet nowadays, it is very important to generate meaningful and succinct descriptions of multimedia contents including images and videos. Although human taggings and annotations can partially label some of the images or videos, it is impossible to exhaustively describe all the multimedia data due to its huge scale. Therefore, the key to this important task is to develop an effective algorithm that can automatically generate a description of an image or a frame. In this paper, we propose a multimodal feature fusion framework which can model any given image-description pair using semantically meaningful features. This framework is trained as a combination of multi-modal deep networks having two integral components: An ensemble of image descriptors and a recursive bigram encoder with fixed length output feature vector. These two components are then integrated into a joint model characterizing the correlations between images and texts. The proposed framework can not only model the unique characteristics of images or texts, but also take into account their correlations at the semantic level. Experiments on real image-text data sets show that the proposed framework is effective and efficient in indexing and retrieving semantically similar pairs, which will be very useful to help people locate interesting images or videos in large-scale databases.
引用
收藏
页码:897 / 902
页数:6
相关论文
共 50 条
  • [21] Multimodal Biometric Person Recognition by Feature Fusion
    Huang, Lin
    Yu, Chenxi
    Cao, Xinzhe
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 1158 - 1162
  • [22] Multimodal Emotion Recognition Based on Feature Fusion
    Xu, Yurui
    Wu, Xiao
    Su, Hang
    Liu, Xiaorui
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 7 - 11
  • [23] Feature Level Fusion in Multimodal Biometric Identification
    Belhia, S.
    Gafour, A.
    2012 SECOND INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING TECHNOLOGY (INTECH), 2012, : 418 - 423
  • [24] Multimodal Biometrics using Cancelable Feature Fusion
    Paul, Padma Polash
    Gavrilova, Marina
    2014 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2014, : 279 - 284
  • [25] Commonality Feature Representation Learning for Unsupervised Multimodal Change Detection
    Liu, Tongfei
    Zhang, Mingyang
    Gong, Maoguo
    Zhang, Qingfu
    Jiang, Fenlong
    Zheng, Hanhong
    Lu, Di
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 1219 - 1233
  • [26] Unsupervised Recurrent Neural Network with Parametric Bias Framework for Human Emotion Recognition with Multimodal Sensor Data Fusion
    Li, Jie
    Zhong, Junpei
    Wang, Min
    SENSORS AND MATERIALS, 2020, 32 (04) : 1261 - 1277
  • [27] Multimodal dual perception fusion framework for multimodal affective analysis
    Lu, Qiang
    Sun, Xia
    Long, Yunfei
    Zhao, Xiaodi
    Zou, Wang
    Feng, Jun
    Wang, Xuxin
    INFORMATION FUSION, 2025, 115
  • [28] DAMF-Net: Unsupervised Domain-Adaptive Multimodal Feature Fusion Method for Partial Point Cloud Registration
    Zhao, Haixia
    Sun, Jiaqi
    Dong, Bin
    REMOTE SENSING, 2024, 16 (11)
  • [29] Unsupervised Feature Selection via Adaptive Multimeasure Fusion
    Zhang, Rui
    Nie, Feiping
    Wang, Yunhai
    Li, Xuelong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (09) : 2886 - 2892
  • [30] UNSUPERVISED TEXTURE SEGMENTATION USING FEATURE SELECTION AND FUSION
    Samanta, Suranjana
    Das, Sukhendu
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 2197 - 2200