MPMRC-MNER: A Unified MRC framework for Multimodal Named Entity Recognition based Multimodal Prompt

被引:2
|
作者
Bao, Xigang [1 ]
Tian, Mengyuan [1 ]
Zha, Zhiyuan [1 ]
Qin, Biao [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Named Entity Recognition; Multimodal Prompt; Contrastive Learning;
D O I
10.1145/3583780.3614975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal named entity recognition (MNER) is a vision-language task, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods often regard an image as a set of visual objects, trying to explicitly capture the relations between visual objects and entities. However, since visual objects are often not identical to entities in quantity and type, they may suffer the bias introduced by visual objects rather than aid. Inspired by the success of textual prompt-based fine-tuning (PF) approaches in many methods, in this paper, we propose a Multimodal Prompt-based Machine Reading Comprehension based framework to implicit alignment between text and image for improving MNER, namely MPMRC-MNER. Specifically, we transform text-only query in MRC into multimodal prompt containing image tokens and text tokens. To better integrate image tokens and text tokens, we design a prompt-aware attention mechanism for better cross-modal fusion. At last, contrastive learning with two types of contrastive losses is designed to learn more consistent representation of two modalities and reduce noise. Extensive experiments and analyses on two public MNER datasets, Twitter2015 and Twitter2017, demonstrate the better performance of our model against the state-of-the-art methods.
引用
收藏
页码:47 / 56
页数:10
相关论文
共 50 条
  • [1] MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding
    Jia, Meihuizi
    Shen, Lei
    Shen, Xin
    Liao, Lejian
    Chen, Meng
    He, Xiaodong
    Chen, Zhendong
    Li, Jiaqi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8032 - 8040
  • [2] Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition
    Jia, Meihuizi
    Shen, Xin
    Shen, Lei
    Pang, Jinhui
    Liao, Lejian
    Song, Yang
    Chen, Meng
    He, Xiaodong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3549 - 3558
  • [3] P-MNER: Cross Modal Correction Fusion Network with Prompt Learning for Multimodal Named Entity Recognition
    Wang, Zhuang
    Zhang, Yijia
    An, Kang
    Zhou, Xiaoying
    Lu, Mingyu
    Lin, Hongfei
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 207 - 221
  • [4] Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer
    Yu, Jianfei
    Jiang, Jing
    Yang, Li
    Xia, Rui
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3342 - 3352
  • [5] Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge
    Li, Jinyuan
    Li, Han
    Pan, Zhuo
    Sun, Di
    Wang, Jiahao
    Zhang, Wenkun
    Pan, Gang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2787 - 2802
  • [6] A Survey on Multimodal Named Entity Recognition
    Qian, Shenyi
    Jin, Wenduo
    Chen, Yonggang
    Ma, Jiangtao
    Qiao, Yaqiong
    Lu, Jinyu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 609 - 622
  • [7] A multi-task framework based on decomposition for multimodal named entity recognition
    Cai, Chenran
    Wang, Qianlong
    Qin, Bing
    Xu, Ruifeng
    NEUROCOMPUTING, 2024, 604
  • [8] GNN-Based Multimodal Named Entity Recognition
    Gong, Yunchao
    Lv, Xueqiang
    Yuan, Zhu
    You, Xindong
    Hu, Feng
    Chen, Yuzhong
    COMPUTER JOURNAL, 2024, 67 (08): : 2622 - 2632
  • [9] A Multi-expert Collaborative Framework for Multimodal Named Entity Recognition
    Xu, Bo
    Jiang, Haiqi
    Wei, Shouang
    Du, Ming
    Song, Hui
    Wang, Hongya
    MULTIMEDIA MODELING, MMM 2025, PT I, 2025, 15520 : 30 - 43
  • [10] MAF: A General Matching and Alignment Framework for Multimodal Named Entity Recognition
    Xu, Bo
    Huang, Shizhou
    Sha, Chaofeng
    Wang, Hongya
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1215 - 1223