Contrastive Adversarial Training for Multi-Modal Machine Translation

被引:2
|
作者
Huang, Xin [1 ]
Zhang, Jiajun [1 ]
Zong, Chengqing [1 ]
机构
[1] Univ Chinese Acad Sci, Chinese Acad Sci, Sch Artificial Intelligence, Natl Lab Pattern Recognit,Inst Automat, Intelligence Bldg,95 Zhongguancun East Rd, Beijing 100190, Peoples R China
关键词
Contrastive Learning; adversarial training; multi-modal machine translation;
D O I
10.1145/3587267
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The multi-modal machine translation task is to improve translation quality with the help of additional visual input. It is expected to disambiguate or complement semantics while there are ambiguous words or incomplete expressions in the sentences. Existing methods have tried many ways to fuse visual information into text representations. However, only a minority of sentences need extra visual information as complementary. Without guidance, models tend to learn text-only translation from the major well-aligned translation pairs. In this article, we propose a contrastive adversarial training approach to enhance visual participation in semantic representation learning. By contrasting multi-modal input with the adversarial samples, the model learns to identify the most informed sample that is coupled with a congruent image and several visual objects extracted from it. This approach can prevent the visual information from being ignored and further fuse cross-modal information. We examine our method in three multi-modal language pairs. Experimental results show that our model is capable of improving translation accuracy. Further analysis shows that our model is more sensitive to visual information.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] On the Adversarial Robustness of Multi-Modal Foundation Models
    Schlarmann, Christian
    Hein, Matthias
    arXiv, 2023,
  • [22] On the Adversarial Robustness of Multi-Modal Foundation Models
    Schlarmann, Christian
    Hein, Matthias
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3679 - 3687
  • [23] Multi-Modal Adversarial Example Detection with Transformer
    Ding, Chaoyue
    Sun, Shiliang
    Zhao, Jing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [24] Entity-level Cross-modal Learning Improves Multi-modal Machine Translation
    Huang, Xin
    Zhang, Jiajun
    Zong, Chengqing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1067 - 1080
  • [25] Multi-modal Contrastive Learning for Healthcare Data Analytics
    Li, Rui
    Gao, Jing
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 120 - 127
  • [26] Turbo your multi-modal classification with contrastive learning
    Zhang, Zhiyu
    Liu, Da
    Liu, Shengqiang
    Wang, Anna
    Gao, Jie
    Li, Yali
    INTERSPEECH 2023, 2023, : 1848 - 1852
  • [27] Contrastive Multi-Modal Knowledge Graph Representation Learning
    Fang, Quan
    Zhang, Xiaowei
    Hu, Jun
    Wu, Xian
    Xu, Changsheng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (09) : 8983 - 8996
  • [28] Deep contrastive representation learning for multi-modal clustering
    Lu, Yang
    Li, Qin
    Zhang, Xiangdong
    Gao, Quanxue
    NEUROCOMPUTING, 2024, 581
  • [29] Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
    Abdulmumin, Idris
    Dash, Satya Ranjan
    Dawud, Musa Abdullahi
    Parida, Shantipriya
    Muhammad, Shamsuddeen Hassan
    Ahmad, Ibrahim Sa'id
    Panda, Subhadarshi
    Bojar, Ondrej
    Galadanci, Bashir Shehu
    Bello, Shehu Bello
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6471 - 6479
  • [30] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
    Chao Wang
    Si-Jia Cai
    Bei-Xiang Shi
    Zhi-Hong Chong
    Journal of Computer Science and Technology, 2023, 38 : 1223 - 1236