MPMRC-MNER: A Unified MRC framework for Multimodal Named Entity Recognition based Multimodal Prompt

被引:2
|
作者
Bao, Xigang [1 ]
Tian, Mengyuan [1 ]
Zha, Zhiyuan [1 ]
Qin, Biao [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Named Entity Recognition; Multimodal Prompt; Contrastive Learning;
D O I
10.1145/3583780.3614975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal named entity recognition (MNER) is a vision-language task, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods often regard an image as a set of visual objects, trying to explicitly capture the relations between visual objects and entities. However, since visual objects are often not identical to entities in quantity and type, they may suffer the bias introduced by visual objects rather than aid. Inspired by the success of textual prompt-based fine-tuning (PF) approaches in many methods, in this paper, we propose a Multimodal Prompt-based Machine Reading Comprehension based framework to implicit alignment between text and image for improving MNER, namely MPMRC-MNER. Specifically, we transform text-only query in MRC into multimodal prompt containing image tokens and text tokens. To better integrate image tokens and text tokens, we design a prompt-aware attention mechanism for better cross-modal fusion. At last, contrastive learning with two types of contrastive losses is designed to learn more consistent representation of two modalities and reduce noise. Extensive experiments and analyses on two public MNER datasets, Twitter2015 and Twitter2017, demonstrate the better performance of our model against the state-of-the-art methods.
引用
收藏
页码:47 / 56
页数:10
相关论文
共 50 条
  • [21] Judicial nested named entity recognition method with MRC framework
    Zhang H.
    Guo J.
    Wang Y.
    Zhang Z.
    Zhao H.
    International Journal of Cognitive Computing in Engineering, 2023, 4 : 118 - 126
  • [22] MMBERT: a unified framework for biomedical named entity recognition
    Lei Fu
    Zuquan Weng
    Jiheng Zhang
    Haihe Xie
    Yiqing Cao
    Medical & Biological Engineering & Computing, 2024, 62 : 327 - 341
  • [23] Multimodal Named Entity Recognition with Bottleneck Fusion and Contrastive Learning
    Wang, Peng
    Chen, Xiaohang
    Shang, Ziyu
    Ke, Wenjun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (04) : 545 - 555
  • [24] MMBERT: a unified framework for biomedical named entity recognition
    Fu, Lei
    Weng, Zuquan
    Zhang, Jiheng
    Xie, Haihe
    Cao, Yiqing
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2024, 62 (01) : 327 - 341
  • [25] Semantics Fusion of Hierarchical Transformers for Multimodal Named Entity Recognition
    Tong, Zhao
    Liu, Qiang
    Shi, Haichao
    Xia, Yuwei
    Wu, Shu
    Zhang, Xiao-Yu
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 414 - 426
  • [26] Explicit Sparse Attention Network for Multimodal Named Entity Recognition
    Liu, Yunfei
    Li, Shengyang
    Hu, Feihu
    Liu, Anqi
    Liu, Yanan
    KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE GRAPH EMPOWERS THE DIGITAL ECONOMY, CCKS 2022, 2022, 1669 : 83 - 94
  • [27] Multimodal Named Entity Recognition with Image Attributes and Image Knowledge
    Chen, Dawei
    Li, Zhixu
    Gu, Binbin
    Chen, Zhigang
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 186 - 201
  • [28] CRISP: A cross-modal integration framework based on the surprisingly popular algorithm for multimodal named entity recognition
    Liu, Haitao
    Xin, Xianwei
    Song, Jihua
    Peng, Weiming
    NEUROCOMPUTING, 2025, 614
  • [29] ESPVR: Entity Spans Position Visual Regions for Multimodal Named Entity Recognition
    Li, Xiujiao
    Sun, Guanglu
    Liu, Xinyu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7785 - 7794
  • [30] MVPN: Multi-granularity visual prompt-guided fusion network for multimodal named entity recognition
    Liu, Wei
    Ren, Aiqun
    Wang, Chao
    Peng, Yan
    Xie, Shaorong
    Li, Weimin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 71639 - 71663