MPMRC-MNER: A Unified MRC framework for Multimodal Named Entity Recognition based Multimodal Prompt

被引:2
|
作者
Bao, Xigang [1 ]
Tian, Mengyuan [1 ]
Zha, Zhiyuan [1 ]
Qin, Biao [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Named Entity Recognition; Multimodal Prompt; Contrastive Learning;
D O I
10.1145/3583780.3614975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal named entity recognition (MNER) is a vision-language task, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods often regard an image as a set of visual objects, trying to explicitly capture the relations between visual objects and entities. However, since visual objects are often not identical to entities in quantity and type, they may suffer the bias introduced by visual objects rather than aid. Inspired by the success of textual prompt-based fine-tuning (PF) approaches in many methods, in this paper, we propose a Multimodal Prompt-based Machine Reading Comprehension based framework to implicit alignment between text and image for improving MNER, namely MPMRC-MNER. Specifically, we transform text-only query in MRC into multimodal prompt containing image tokens and text tokens. To better integrate image tokens and text tokens, we design a prompt-aware attention mechanism for better cross-modal fusion. At last, contrastive learning with two types of contrastive losses is designed to learn more consistent representation of two modalities and reduce noise. Extensive experiments and analyses on two public MNER datasets, Twitter2015 and Twitter2017, demonstrate the better performance of our model against the state-of-the-art methods.
引用
收藏
页码:47 / 56
页数:10
相关论文
共 50 条
  • [31] Named Entity Recognition via Unified Information Extraction Framework
    Chen, Xinyue
    Zhang, Zhenguo
    Lu, Xinghua
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 308 - 313
  • [32] MLNet: a multi-level multimodal named entity recognition architecture
    Zhai, Hanming
    Lv, Xiaojun
    Hou, Zhiwen
    Tong, Xin
    Bu, Fanliang
    FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [33] SpanMRC: Query with Entity Length for MRC-Based Named Entity Recognition
    Wu, Hao
    Li, Xianxian
    Liu, Peng
    Wang, Li-e
    Yang, Danping
    Zhou, Aoxiang
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 281 - 293
  • [34] A unified prompt-based framework for few-shot multimodal language analysis
    Zhang, Xiaohan
    Cao, Runmin
    Wang, Yifan
    Li, Songze
    Xu, Hua
    Gao, Kai
    Huang, Lunsong
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2025, 26
  • [35] ANeTCM: A Novel MRC Framework for Traditional Chinese Medicine Named Entity Recognition
    Feng, Yuanyu
    Zhou, Yan
    IEEE ACCESS, 2024, 12 : 113235 - 113243
  • [36] Prompt-Based Data Augmentation Framework for Few-Shot Named Entity Recognition
    Wang, Moyao
    Gao, Hui
    Zhang, Peng
    Zhang, Jing
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 451 - 462
  • [37] Assisting Multimodal Named Entity Recognition by cross-modal auxiliary tasks
    Chen, Zhengjie
    Zhang, Yu
    Mi, Siya
    PATTERN RECOGNITION LETTERS, 2023, 175 : 52 - 58
  • [38] USAF: Multimodal Chinese named entity recognition using synthesized acoustic features
    Liu, Ye
    Huang, Shaobin
    Li, Rongsheng
    Yan, Naiyu
    Du, Zhijuan
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [39] In-context Learning for Few-shot Multimodal Named Entity Recognition
    Cai, Chenran
    Wang, Qianlong
    Liang, Bin
    Qin, Bing
    Yang, Min
    Wong, Kam-Fai
    Xu, Ruifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2969 - 2979
  • [40] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
    Cheng J.
    Long K.
    Zhang S.
    Zhang T.
    Ma L.
    Cheng S.
    Guo Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839