Contrastive Adversarial Training for Multi-Modal Machine Translation

被引:2
|
作者
Huang, Xin [1 ]
Zhang, Jiajun [1 ]
Zong, Chengqing [1 ]
机构
[1] Univ Chinese Acad Sci, Chinese Acad Sci, Sch Artificial Intelligence, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,95 Zhongguancun East Rd, Beijing 100190, Peoples R China
关键词
Contrastive Learning; adversarial training; multi-modal machine translation;
D O I
10.1145/3587267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The multi-modal machine translation task is to improve translation quality with the help of additional visual input. It is expected to disambiguate or complement semantics while there are ambiguous words or incomplete expressions in the sentences. Existing methods have tried many ways to fuse visual information into text representations. However, only a minority of sentences need extra visual information as complementary. Without guidance, models tend to learn text-only translation from the major well-aligned translation pairs. In this article, we propose a contrastive adversarial training approach to enhance visual participation in semantic representation learning. By contrasting multi-modal input with the adversarial samples, the model learns to identify the most informed sample that is coupled with a congruent image and several visual objects extracted from it. This approach can prevent the visual information from being ignored and further fuse cross-modal information. We examine our method in three multi-modal language pairs. Experimental results show that our model is capable of improving translation accuracy. Further analysis shows that our model is more sensitive to visual information.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Multi-modal data novelty detection with adversarial autoencoders
    Chen, Zeqiu
    Zhao, Kaiyi
    Sun, Ruizhi
    APPLIED SOFT COMPUTING, 2024, 165
  • [42] Multi-modal translation system and its evaluation
    Morishima, S
    Nakamura, S
    FOURTH IEEE INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES, PROCEEDINGS, 2002, : 241 - 246
  • [43] Latent Variable Model for Multi-modal Translation
    Calixto, Iacer
    Rios, Miguel
    Aziz, Wilker
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6392 - 6405
  • [44] Robust Multi-Modal Sensor Fusion: An Adversarial Approach
    Roheda, Siddharth
    Krim, Hamid
    Riggan, Benjamin S.
    IEEE SENSORS JOURNAL, 2021, 21 (02) : 1885 - 1896
  • [45] Improving Code Search with Multi-Modal Momentum Contrastive Learning
    Shi, Zejian
    Xiong, Yun
    Zhang, Yao
    Jiang, Zhijie
    Zhao, Jinjing
    Wang, Lei
    Li, Shanshan
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 280 - 291
  • [46] Improving Medical Multi-modal Contrastive Learning with Expert Annotations
    Kumar, Yogesh
    Marttinen, Pekka
    COMPUTER VISION - ECCV 2024, PT XX, 2025, 15078 : 468 - 486
  • [47] Optimizing Machine Translation Algorithms through Empirical Study of Multi-modal Information Fusion
    Zhong Xuewen
    2ND INTERNATIONAL CONFERENCE ON SUSTAINABLE COMPUTING AND SMART SYSTEMS, ICSCSS 2024, 2024, : 1336 - 1341
  • [48] A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine Translation
    Yin, Yongjing
    Meng, Fandong
    Su, Jinsong
    Zhou, Chulun
    Yang, Zhengyuan
    Zhou, Jie
    Luo, Jiebo
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3025 - 3035
  • [49] Graph Embedding Contrastive Multi-Modal Representation Learning for Clustering
    Xia, Wei
    Wang, Tianxiu
    Gao, Quanxue
    Yang, Ming
    Gao, Xinbo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1170 - 1183
  • [50] A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation
    Wang, Yongkang
    Liu, Xuan
    Huang, Feng
    Xiong, Zhankun
    Zhang, Wen
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 3 - 11