MPMRC-MNER: A Unified MRC framework for Multimodal Named Entity Recognition based Multimodal Prompt

被引:2
|
作者
Bao, Xigang [1 ]
Tian, Mengyuan [1 ]
Zha, Zhiyuan [1 ]
Qin, Biao [1 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Named Entity Recognition; Multimodal Prompt; Contrastive Learning;
D O I
10.1145/3583780.3614975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal named entity recognition (MNER) is a vision-language task, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods often regard an image as a set of visual objects, trying to explicitly capture the relations between visual objects and entities. However, since visual objects are often not identical to entities in quantity and type, they may suffer the bias introduced by visual objects rather than aid. Inspired by the success of textual prompt-based fine-tuning (PF) approaches in many methods, in this paper, we propose a Multimodal Prompt-based Machine Reading Comprehension based framework to implicit alignment between text and image for improving MNER, namely MPMRC-MNER. Specifically, we transform text-only query in MRC into multimodal prompt containing image tokens and text tokens. To better integrate image tokens and text tokens, we design a prompt-aware attention mechanism for better cross-modal fusion. At last, contrastive learning with two types of contrastive losses is designed to learn more consistent representation of two modalities and reduce noise. Extensive experiments and analyses on two public MNER datasets, Twitter2015 and Twitter2017, demonstrate the better performance of our model against the state-of-the-art methods.
引用
收藏
页码:47 / 56
页数:10
相关论文
共 50 条
  • [41] Multimodal Named Entity Recognition and Relation Extraction with Retrieval-Augmented Strategy
    Hu, Xuming
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 3488 - 3488
  • [42] UniMEL: A Unified Framework for Multimodal Entity Linking with Large Language Models
    Liu, Qi
    He, Yongyi
    Xu, Tong
    Lian, Defu
    Liu, Che
    Zheng, Zhi
    Chen, Enhong
    International Conference on Information and Knowledge Management, Proceedings, : 1909 - 1919
  • [43] A Biomedical Named Entity Recognition Framework with Multi-granularity Prompt Tuning
    Liu, Zhuoya
    Chi, Tang
    Zhang, Peiliang
    Wu, Xiaoting
    Che, Chao
    HEALTH INFORMATION PROCESSING, CHIP 2022, 2023, 1772 : 95 - 105
  • [44] Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree
    Wang, Caiyu
    Wang, Hong
    Zhuang, Hui
    Li, Wei
    Han, Shu
    Zhang, Hui
    Zhuang, Luhe
    JOURNAL OF BIOMEDICAL INFORMATICS, 2020, 111 (111)
  • [45] Prompt-Based Self-training Framework for Few-Shot Named Entity Recognition
    Huang, Ganghong
    Zhong, Jiang
    Wang, Chen
    Dai, Qizhu
    Li, Rongzhen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 91 - 103
  • [46] MAFN: multi-level attention fusion network for multimodal named entity recognition
    Xiaoying Zhou
    Yijia Zhang
    Zhuang Wang
    Mingyu Lu
    Xiaoxia Liu
    Multimedia Tools and Applications, 2024, 83 : 45047 - 45058
  • [47] On development of multimodal named entity recognition using part-of-speech and mixture of experts
    Chen, Jianying
    Xue, Yun
    Zhang, Haolan
    Ding, Weiping
    Zhang, Zhengxuan
    Chen, Jiehai
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (06) : 2181 - 2192
  • [48] UAMNer: uncertainty-aware multimodal named entity recognition in social media posts
    Luping Liu
    Meiling Wang
    Mozhi Zhang
    Linbo Qing
    Xiaohai He
    Applied Intelligence, 2022, 52 : 4109 - 4125
  • [49] CLGLF: Confidence Learning Guides Label Fusion for Multimodal Named Entity Recognition Method
    Wang, Hai-Rong
    Wang, Tong
    Xu, Xi
    Jing, Bo-Xiang
    Chen, Fang-Ping
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (07): : 2429 - 2437
  • [50] Multi-scale Visual Semantic Enhancement for Multimodal Named Entity Recognition Method
    Wang H.-R.
    Xu X.
    Wang T.
    Chen F.-P.
    Zidonghua Xuebao/Acta Automatica Sinica, 2024, 50 (06): : 1234 - 1245