Multimodal Representation Learning via Maximization of Local Mutual Information

被引:23
|
作者
Liao, Ruizhi [1 ]
Moyer, Daniel [1 ]
Cha, Miriam [2 ]
Quigley, Keegan [2 ]
Berkowitz, Seth [3 ]
Horng, Steven [3 ]
Golland, Polina [1 ]
Wells, William M. [1 ,4 ]
机构
[1] MIT, CSAIL, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] MIT, Lincoln Lab, 244 Wood St, Lexington, MA 02173 USA
[3] Harvard Med Sch, Beth Israel Deaconess Med Ctr, Boston, MA 02115 USA
[4] Harvard Med Sch, Brigham & Womens Hosp, Boston, MA 02115 USA
关键词
Multimodal representation learning; Local feature representations; Mutual information maximization;
D O I
10.1007/978-3-030-87196-3_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method trains image and text encoders by encouraging the resulting representations to exhibit high local mutual information. We make use of recent advances in mutual information estimation with neural network discriminators. We argue that the sum of local mutual information is typically a lower bound on the global mutual information. Our experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning.
引用
收藏
页码:273 / 283
页数:11
相关论文
共 50 条
  • [31] Learning Representations by Graphical Mutual Information Estimation and Maximization
    Peng, Zhen
    Luo, Minnan
    Huang, Wenbing
    Li, Jundong
    Zheng, Qinghua
    Sun, Fuchun
    Huang, Junzhou
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 722 - 737
  • [32] Unimodal and Multimodal Integrated Representation Learning via Improved Information Bottleneck for Multimodal Sentiment Analysis
    Zhang, Tonghui
    Dong, Changfei
    Su, Jinsong
    Zhang, Haiying
    Li, Yuzheng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 564 - 576
  • [33] Heterogeneous graph representation learning via mutual information estimation for fraud detection
    Zhang, Zheng
    Su, Xiangyu
    Wu, Ji
    Tessone, Claudio J.
    Liao, Hao
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2025, 234
  • [34] Representation of mutual information via input estimates
    Palomar, Daniel P.
    Verdu, Sergio
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2007, 53 (02) : 453 - 470
  • [35] Robust and Discriminative Feature Learning via Mutual Information Maximization for Object Detection in Aerial Images
    Sun, Xu
    Yu, Yinhui
    Cheng, Qing
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (03): : 4149 - 4171
  • [36] Learning Discriminative Features for Ground-Based Cloud Classification via Mutual Information Maximization
    Liu, Shuang
    Zhang, Zhong
    Xiao, Baihua
    Cao, Xiaozhong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (07): : 1422 - 1425
  • [37] Fair Representation Learning: An Alternative to Mutual Information
    Liu, Ji
    Li, Zenan
    Yao, Yuan
    Xu, Feng
    Ma, Xiaoxing
    Xu, Miao
    Tong, Hanghang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1088 - 1097
  • [38] MUSCLE: Strengthening Semi-Supervised Learning Via Concurrent Unsupervised Learning Using Mutual Information Maximization
    Xie, Hanchen
    Hussein, Mohamed E.
    Galstyan, Aram
    Abd-Almageed, Wael
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2585 - 2594
  • [39] Sequential Recommendation with Collaborative Explanation via Mutual Information Maximization
    Yu, Yi
    Sugiyama, Kazunari
    Jatowt, Adam
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1062 - 1072
  • [40] DOMAIN ADAPTATION VIA MUTUAL INFORMATION MAXIMIZATION FOR HANDWRITING RECOGNITION
    Tang, Pei
    Peng, Liangrui
    Yan, Ruijie
    Shi, Haodong
    Yao, Gang
    Liu, Changsong
    Li, Jie
    Zhang, Yuqi
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2300 - 2304