A Multimodal Framework for Unsupervised Feature Fusion

被引:0
|
作者
Li, Xiaoyi [1 ]
Gao, Jing [1 ]
Li, Hui [1 ]
Yang, Le [1 ]
Srihari, Rohini K. [1 ]
机构
[1] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
Multimodal Framework; Feature Fusion; Restricted Boltzmann Machine;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the overwhelming amounts of visual contents on the Internet nowadays, it is very important to generate meaningful and succinct descriptions of multimedia contents including images and videos. Although human taggings and annotations can partially label some of the images or videos, it is impossible to exhaustively describe all the multimedia data due to its huge scale. Therefore, the key to this important task is to develop an effective algorithm that can automatically generate a description of an image or a frame. In this paper, we propose a multimodal feature fusion framework which can model any given image-description pair using semantically meaningful features. This framework is trained as a combination of multi-modal deep networks having two integral components: An ensemble of image descriptors and a recursive bigram encoder with fixed length output feature vector. These two components are then integrated into a joint model characterizing the correlations between images and texts. The proposed framework can not only model the unique characteristics of images or texts, but also take into account their correlations at the semantic level. Experiments on real image-text data sets show that the proposed framework is effective and efficient in indexing and retrieving semantically similar pairs, which will be very useful to help people locate interesting images or videos in large-scale databases.
引用
收藏
页码:897 / 902
页数:6
相关论文
共 50 条
  • [1] Multimodal dynamic fusion framework: Multilevel feature fusion guided by prompts
    Pan, Lei
    Wu, Huan-Qing
    EXPERT SYSTEMS, 2024, 41 (11)
  • [2] A Geometric Framework for Feature Mappings in Multimodal Fusion of Brain Image Data
    Zhang, Wen
    Mi, Liang
    Thompson, Paul M.
    Wang, Yalin
    INFORMATION PROCESSING IN MEDICAL IMAGING, IPMI 2019, 2019, 11492 : 617 - 630
  • [3] UMMFF: Unsupervised Multimodal Multilevel Feature Fusion Network for Hyperspectral Image Super-Resolution
    Jiang, Zhongmin
    Chen, Mengyao
    Wang, Wenju
    REMOTE SENSING, 2024, 16 (17)
  • [4] MFFDTA: A Multimodal Feature Fusion Framework for Drug-Target Affinity Prediction
    Wang, Wei
    Su, Ziwen
    Liu, Dong
    Zhang, Hongjun
    Shang, Jiangli
    Zhou, Yun
    Wang, Xianfang
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT II, ICIC 2024, 2024, 14882 : 243 - 254
  • [5] Unsupervised Semantic Segmentation with Feature Fusion
    Zhu, Lifu
    Huang, Jing
    Ye, Shaoxiong
    2023 3RD ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE, ACCTCS, 2023, : 162 - 167
  • [6] An efficient framework for unsupervised feature selection
    Zhang, Han
    Zhang, Rui
    Nie, Feiping
    Li, Xuelong
    NEUROCOMPUTING, 2019, 366 : 194 - 207
  • [7] Multimodal feature fusion for concreteness estimation
    Incitti, Francesca
    Snidaro, Lauro
    2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
  • [8] Fatigue State Recognition System for Miners Based on a Multimodal Feature Extraction and Fusion Framework
    Pan, Hongguang
    Tong, Shiyu
    Wei, Xuqiang
    Teng, Bingyang
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2025, 17 (02) : 410 - 420
  • [9] A Multimodal Feature Fusion Framework for Sleep-Deprived Fatigue Detection to Prevent Accidents
    Virk, Jitender Singh
    Singh, Mandeep
    Singh, Mandeep
    Panjwani, Usha
    Ray, Koushik
    SENSORS, 2023, 23 (08)
  • [10] Unsupervised Multimodal Feature Learning for Semantic Image Segmentation
    Pei, Deli
    Liu, Huaping
    Liu, Yulong
    Sun, Fuchun
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,