Supervised cross-modal factor analysis for multiple modal data classification

被引:13
|
作者
Wang, Jingbin [1 ,2 ]
Zhou, Yihua [3 ]
Duan, Kanghong [4 ]
Wang, Jim Jing-Yan [5 ]
Bensmail, Halima [6 ]
机构
[1] Chinese Acad Sci, Natl Time Serv Ctr, Xian 710600, Peoples R China
[2] Chinese Acad Sci, Grad Univ, Beijing 100039, Peoples R China
[3] Lehigh Univ, Dept Mech Engn & Mech, Bethlehem, PA 18015 USA
[4] State Ocean Adm, North China Sea Marine Tech Support Ctr, Qingdao 266033, Peoples R China
[5] King Abdullah Univ Sci & Technol, Comp Elect & Math Sci & Engn Div, Thuwal 23955, Saudi Arabia
[6] Qatar Comp Res Inst, Doha 5825, Qatar
来源
2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS | 2015年
关键词
Multiple modal learning; Cross-modal factor analysis; Supervised learning; SPARSE REPRESENTATION; TEXT CLASSIFICATION; SURFACE; ACTIVATION; NETWORK;
D O I
10.1109/SMC.2015.329
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods.
引用
收藏
页码:1882 / 1888
页数:7
相关论文
共 50 条
  • [21] Supervised Hierarchical Online Hashing for Cross-modal Retrieval
    Han, Kai
    Liu, Yu
    Wei, Rukai
    Zhou, Ke
    Xu, Jinhui
    Long, Kun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [22] Correlation Autoencoder Hashing for Supervised Cross-Modal Search
    Cao, Yue
    Long, Mingsheng
    Wang, Jianmin
    Zhu, Han
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 197 - 204
  • [23] Supervised Contrastive Discrete Hashing for cross-modal retrieval
    Li, Ze
    Yao, Tao
    Wang, Lili
    Li, Ying
    Wang, Gang
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [24] Discriminative correlation hashing for supervised cross-modal retrieval
    Lu, Xu
    Zhang, Huaxiang
    Sun, Jiande
    Wang, Zhenhua
    Guo, Peilian
    Wan, Wenbo
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 65 : 221 - 230
  • [25] Discrete Robust Supervised Hashing for Cross-Modal Retrieval
    Yao, Tao
    Zhang, Zhiwang
    Yan, Lianshan
    Yue, Jun
    Tian, Qi
    IEEE ACCESS, 2019, 7 : 39806 - 39814
  • [26] Supervised Hierarchical Deep Hashing for Cross-Modal Retrieval
    Zhan, Yu-Wei
    Luo, Xin
    Wang, Yongxin
    Xu, Xin-Shun
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3386 - 3394
  • [27] Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
    Wu, Fei
    Li, Shuaishuai
    Gao, Guangwei
    Ji, Yimu
    Jing, Xiao-Yuan
    Wan, Zhiguo
    PATTERN RECOGNITION, 2023, 136
  • [28] Testing Simulation Theory with Cross-Modal Multivariate Classification of fMRI Data
    Etzel, Joset A.
    Gazzola, Valeria
    Keysers, Christian
    PLOS ONE, 2008, 3 (11):
  • [29] X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data
    Hong, Danfeng
    Yokoya, Naoto
    Xia, Gui-Song
    Chanussot, Jocelyn
    Zhu, Xiao Xiang
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 167 : 12 - 23
  • [30] KERNEL CROSS-MODAL FACTOR ANALYSIS FOR MULTIMODAL INFORMATION FUSION
    Wang, Yongjin
    Guan, Ling
    Venetsanopoulos, A. N.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2384 - 2387