Progressive modality-complement aggregative multitransformer for domain multi-modal neural machine translation

被引:2
|
作者
Guo, Junjun [1 ,2 ]
Hou, Zhenyu [1 ,2 ]
Xian, Yantuan [1 ,2 ]
Yu, Zhengtao [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Peoples R China
[2] Yunnan Key Lab Artificial Intelligence, Kunming 650504, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Domain multi-modal neural machine; translation; Multi-modal transformer; Progressive modality-complement; Modality-specific mask;
D O I
10.1016/j.patcog.2024.110294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain -specific Multi -modal Neural Machine Translation (DMNMT) aims to translate domain -specific sentences from a source language to a target language by incorporating text -related visual information. Generally, domain -specific text -image data often complement each other and have the potential to collaboratively enhance the representation of domain -specific information. Unfortunately, there is a considerable modality gap between image and text in data format and semantic expression, which leads to distinctive challenges in domain -text translation tasks. Narrowing the modality gap and improving domain -aware representation are two critical challenges in DMNMT. To this end, this paper proposes a progressive modality -complement aggregative MultiTransformer, which aims to simultaneously narrow the modality gap and capture domain -specific multimodal representation. We first adopt a bidirectional progressive cross -modal interactive strategy to effectively narrow the text -to -text, text -to -visual, and visual -to -text semantics in the multi -modal representation space by integrating visual and text information layer -by -layer. Subsequently, we introduce a modality -complement MultiTransformer based on progressive cross -modal interaction to extract the domain -related multi -modal representation, thereby enhancing machine translation performance. Experiment results on the Fashion-MMT and Multi -30k datasets are conducted, and the results show that the proposed approach outperforms the compared state-of-the-art (SOTA) methods on the En-Zh task in E -commerce domain, En -De, En -Fr and En -Cs tasks of Multi -30k in general domain. The in-depth analysis confirms the validity of the proposed modality -complement MultiTransformer and bidirectional progressive cross -modal interactive strategy for DMNMT.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation
    Guo, Junjun
    Ye, Junjie
    Xiang, Yan
    Yu, Zhengtao
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3015 - 3026
  • [2] Unsupervised Multi-modal Neural Machine Translation
    Su, Yuanhang
    Fan, Kai
    Nguyen Bach
    Kuo, C-C Jay
    Huang, Fei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
  • [3] Multi-modal neural machine translation with deep semantic interactions
    Su, Jinsong
    Chen, Jinchang
    Jiang, Hui
    Zhou, Chulun
    Lin, Huan
    Ge, Yubin
    Wu, Qingqiang
    Lai, Yongxuan
    INFORMATION SCIENCES, 2021, 554 : 47 - 60
  • [4] Multi-modal graph contrastive encoding for neural machine translation
    Yin, Yongjing
    Zeng, Jiali
    Su, Jinsong
    Zhou, Chulun
    Meng, Fandong
    Zhou, Jie
    Huang, Degen
    Luo, Jiebo
    ARTIFICIAL INTELLIGENCE, 2023, 323
  • [5] Learning to decode to future success for multi-modal neural machine translation
    Huang, Yan
    Zhang, TianYuan
    Xu, Chun
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):
  • [6] Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
    Calixto, Iacer
    Liu, Qun
    Campbell, Nick
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1913 - 1924
  • [7] Measuring Modality Utilization in Multi-Modal Neural Networks
    Singh, Saurav
    Markopoulos, Panos P.
    Saber, Eli
    Lew, Jesse D.
    Heard, Jamison
    2023 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI, 2023, : 11 - 14
  • [8] An error analysis for image-based multi-modal neural machine translation
    Calixto, Iacer
    Liu, Qun
    MACHINE TRANSLATION, 2019, 33 (1-2) : 155 - 177
  • [9] RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation
    Wang, Yan
    Zeng, Yawen
    Liang, Junjie
    Xing, Xiaofen
    Xu, Jin
    Xu, Xiangmin
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 860 - 868
  • [10] Video Pivoting Unsupervised Multi-Modal Machine Translation
    Li, Mingjie
    Huang, Po-Yao
    Chang, Xiaojun
    Hu, Junjie
    Yang, Yi
    Hauptmann, Alex
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3918 - 3932