MPMRC-MNER: A Unified MRC framework for Multimodal Named Entity Recognition based Multimodal Prompt

被引：2

作者：

Bao, Xigang ^{[1
]}

Tian, Mengyuan ^{[1
]}

Zha, Zhiyuan ^{[1
]}

Qin, Biao ^{[1
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Multimodal Named Entity Recognition; Multimodal Prompt; Contrastive Learning;

D O I：

10.1145/3583780.3614975

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multimodal named entity recognition (MNER) is a vision-language task, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods often regard an image as a set of visual objects, trying to explicitly capture the relations between visual objects and entities. However, since visual objects are often not identical to entities in quantity and type, they may suffer the bias introduced by visual objects rather than aid. Inspired by the success of textual prompt-based fine-tuning (PF) approaches in many methods, in this paper, we propose a Multimodal Prompt-based Machine Reading Comprehension based framework to implicit alignment between text and image for improving MNER, namely MPMRC-MNER. Specifically, we transform text-only query in MRC into multimodal prompt containing image tokens and text tokens. To better integrate image tokens and text tokens, we design a prompt-aware attention mechanism for better cross-modal fusion. At last, contrastive learning with two types of contrastive losses is designed to learn more consistent representation of two modalities and reduce noise. Extensive experiments and analyses on two public MNER datasets, Twitter2015 and Twitter2017, demonstrate the better performance of our model against the state-of-the-art methods.

引用

页码：47 / 56

页数：10

共 50 条

[31] Named Entity Recognition via Unified Information Extraction Framework
Chen, Xinyue
Zhang, Zhenguo
Lu, Xinghua
2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 308 - 313
[32] MLNet: a multi-level multimodal named entity recognition architecture
Zhai, Hanming
Lv, Xiaojun
Hou, Zhiwen
Tong, Xin
Bu, Fanliang
FRONTIERS IN NEUROROBOTICS, 2023, 17
[33] SpanMRC: Query with Entity Length for MRC-Based Named Entity Recognition
Wu, Hao
Li, Xianxian
Liu, Peng
Wang, Li-e
Yang, Danping
Zhou, Aoxiang
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 281 - 293
[34] A unified prompt-based framework for few-shot multimodal language analysis
Zhang, Xiaohan
Cao, Runmin
Wang, Yifan
Li, Songze
Xu, Hua
Gao, Kai
Huang, Lunsong
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2025, 26
[35] ANeTCM: A Novel MRC Framework for Traditional Chinese Medicine Named Entity Recognition
Feng, Yuanyu
Zhou, Yan
IEEE ACCESS, 2024, 12 : 113235 - 113243
[36] Prompt-Based Data Augmentation Framework for Few-Shot Named Entity Recognition
Wang, Moyao
Gao, Hui
Zhang, Peng
Zhang, Jing
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14877 : 451 - 462
[37] Assisting Multimodal Named Entity Recognition by cross-modal auxiliary tasks
Chen, Zhengjie
Zhang, Yu
Mi, Siya
PATTERN RECOGNITION LETTERS, 2023, 175 : 52 - 58
[38] USAF: Multimodal Chinese named entity recognition using synthesized acoustic features
Liu, Ye
Huang, Shaobin
Li, Rongsheng
Yan, Naiyu
Du, Zhijuan
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
[39] In-context Learning for Few-shot Multimodal Named Entity Recognition
Cai, Chenran
Wang, Qianlong
Liang, Bin
Qin, Bing
Yang, Min
Wong, Kam-Fai
Xu, Ruifeng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2969 - 2979
[40] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
Cheng J.
Long K.
Zhang S.
Zhang T.
Ma L.
Cheng S.
Guo Y.
IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839

← 1 2 3 4 5 →