Contrastive Adversarial Training for Multi-Modal Machine Translation

被引:2
|
作者
Huang, Xin [1 ]
Zhang, Jiajun [1 ]
Zong, Chengqing [1 ]
机构
[1] Univ Chinese Acad Sci, Chinese Acad Sci, Sch Artificial Intelligence, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,95 Zhongguancun East Rd, Beijing 100190, Peoples R China
关键词
Contrastive Learning; adversarial training; multi-modal machine translation;
D O I
10.1145/3587267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The multi-modal machine translation task is to improve translation quality with the help of additional visual input. It is expected to disambiguate or complement semantics while there are ambiguous words or incomplete expressions in the sentences. Existing methods have tried many ways to fuse visual information into text representations. However, only a minority of sentences need extra visual information as complementary. Without guidance, models tend to learn text-only translation from the major well-aligned translation pairs. In this article, we propose a contrastive adversarial training approach to enhance visual participation in semantic representation learning. By contrasting multi-modal input with the adversarial samples, the model learns to identify the most informed sample that is coupled with a congruent image and several visual objects extracted from it. This approach can prevent the visual information from being ignored and further fuse cross-modal information. We examine our method in three multi-modal language pairs. Experimental results show that our model is capable of improving translation accuracy. Further analysis shows that our model is more sensitive to visual information.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Noise-Robust Semi-supervised Multi-modal Machine Translation
    Li, Lin
    Hu, Kaixi
    Tayir, Turghun
    Liu, Jianquan
    Lee, Kong Aik
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 155 - 168
  • [32] Hindi Visual Genome: A Dataset for Multi-Modal English to Hindi Machine Translation
    Parida, Shantipriya
    Bojar, Ondrej
    Dash, Satya Ranjan
    COMPUTACION Y SISTEMAS, 2019, 23 (04): : 1499 - 1505
  • [33] Probing Multi-modal Machine Translation with Pre-trained Language Model
    Kong, Yawei
    Fan, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3689 - 3699
  • [34] An error analysis for image-based multi-modal neural machine translation
    Calixto, Iacer
    Liu, Qun
    MACHINE TRANSLATION, 2019, 33 (1-2) : 155 - 177
  • [35] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
    Wang, Chao
    Cai, Si-Jia
    Shi, Bei-Xiang
    Chong, Zhi-Hong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (06) : 1223 - 1236
  • [36] Multi-modal Adversarial Training for Crisis-related Data Classification on Social Media
    Chen, Qi
    Wang, Wei
    Huang, Kaizhu
    De, Suparna
    Coenen, Frans
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP), 2020, : 232 - 237
  • [37] UAC-AD: Unsupervised Adversarial Contrastive Learning for Anomaly Detection on Multi-Modal Data in Microservice Systems
    Liu, Hongyi
    Huang, Xiaosong
    Jia, Mengxi
    Jia, Tong
    Han, Jing
    Wu, Zhonghai
    Li, Ying
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (06) : 3887 - 3900
  • [38] Task-Adversarial Adaptation for Multi-modal Recommendation
    Su, Hongzu
    Li, Jingjing
    Li, Fengling
    Zhu, Lei
    Lu, Ke
    Yang, Yang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6530 - 6538
  • [39] Adversarial learning for mono- or multi-modal registration
    Fan, Jingfan
    Cao, Xiaohuan
    Wang, Qian
    Yap, Pew-Thian
    Shen, Dinggang
    MEDICAL IMAGE ANALYSIS, 2019, 58
  • [40] CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
    Zolfaghari, Mohammadreza
    Zhu, Yi
    Gehler, Peter
    Brox, Thomas
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1430 - 1439