Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation

被引:2
|
作者
Guo, Junjun [1 ,2 ]
Hou, Zhenyu [1 ,2 ]
Xian, Yantuan [1 ,2 ]
Yu, Zhengtao [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Yunnan Key Lab Artificial Intelligence, Kunming 650504, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Domain multi-modal neural machine; translation; Multi-modal transformer; Progressive modality-complement; Modality-specific mask;
D O I
10.1016/j.patcog.2024.110294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain -specific Multi -modal Neural Machine Translation (DMNMT) aims to translate domain -specific sentences from a source language to a target language by incorporating text -related visual information. Generally, domain -specific text -image data often complement each other and have the potential to collaboratively enhance the representation of domain -specific information. Unfortunately, there is a considerable modality gap between image and text in data format and semantic expression, which leads to distinctive challenges in domain -text translation tasks. Narrowing the modality gap and improving domain -aware representation are two critical challenges in DMNMT. To this end, this paper proposes a progressive modality -complement aggregative MultiTransformer, which aims to simultaneously narrow the modality gap and capture domain -specific multimodal representation. We first adopt a bidirectional progressive cross -modal interactive strategy to effectively narrow the text -to -text, text -to -visual, and visual -to -text semantics in the multi -modal representation space by integrating visual and text information layer -by -layer. Subsequently, we introduce a modality -complement MultiTransformer based on progressive cross -modal interaction to extract the domain -related multi -modal representation, thereby enhancing machine translation performance. Experiment results on the Fashion-MMT and Multi -30k datasets are conducted, and the results show that the proposed approach outperforms the compared state-of-the-art (SOTA) methods on the En-Zh task in E -commerce domain, En -De, En -Fr and En -Cs tasks of Multi -30k in general domain. The in-depth analysis confirms the validity of the proposed modality -complement MultiTransformer and bidirectional progressive cross -modal interactive strategy for DMNMT.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] HybridVocab: Towards Multi-Modal Machine Translation via Multi-Aspect Alignment
    Peng, Ru
    Zeng, Yawen
    Zhao, Junbo
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 380 - 388
  • [22] Entity-level Cross-modal Learning Improves Multi-modal Machine Translation
    Huang, Xin
    Zhang, Jiajun
    Zong, Chengqing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1067 - 1080
  • [23] Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation
    Abdulmumin, Idris
    Dash, Satya Ranjan
    Dawud, Musa Abdullahi
    Parida, Shantipriya
    Muhammad, Shamsuddeen Hassan
    Ahmad, Ibrahim Sa'id
    Panda, Subhadarshi
    Bojar, Ondrej
    Galadanci, Bashir Shehu
    Bello, Shehu Bello
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6471 - 6479
  • [24] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
    Chao Wang
    Si-Jia Cai
    Bei-Xiang Shi
    Zhi-Hong Chong
    Journal of Computer Science and Technology, 2023, 38 : 1223 - 1236
  • [25] Noise-Robust Semi-supervised Multi-modal Machine Translation
    Li, Lin
    Hu, Kaixi
    Tayir, Turghun
    Liu, Jianquan
    Lee, Kong Aik
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 155 - 168
  • [26] Hindi Visual Genome: A Dataset for Multi-Modal English to Hindi Machine Translation
    Parida, Shantipriya
    Bojar, Ondrej
    Dash, Satya Ranjan
    COMPUTACION Y SISTEMAS, 2019, 23 (04): : 1499 - 1505
  • [27] Probing Multi-modal Machine Translation with Pre-trained Language Model
    Kong, Yawei
    Fan, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3689 - 3699
  • [28] Visual Topic Semantic Enhanced Machine Translation for Multi-Modal Data Efficiency
    Wang, Chao
    Cai, Si-Jia
    Shi, Bei-Xiang
    Chong, Zhi-Hong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (06) : 1223 - 1236
  • [29] Modality-convolutions: Multi-modal Gesture Recognition Based on Convolutional Neural Network
    Huo, Da
    Chen, Yufeng
    Li, Fengxia
    Lei, Zhengchao
    2017 12TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2017), 2017, : 349 - 353
  • [30] Multi-grained visual pivot-guided multi-modal neural machine translation with text-aware cross-modal contrastive disentangling
    Guo, Junjun
    Su, Rui
    Ye, Junjie
    NEURAL NETWORKS, 2024, 178