Query Prior Matters: A MRC Framework for Multimodal Named Entity Recognition

被引:20
|
作者
Jia, Meihuizi [1 ,2 ]
Shen, Xin [3 ]
Shen, Lei [2 ]
Pang, Jinhui [1 ]
Liao, Lejian [1 ]
Song, Yang [2 ]
Chen, Meng [2 ]
He, Xiaodong [2 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] JD AI, Beijing, Peoples R China
[3] Australian Natl Univ, Canberra, ACT, Australia
基金
国家重点研发计划;
关键词
multimodal named entity recognition; machine reading comprehension; visual grounding; transfer learning;
D O I
10.1145/3503161.3548427
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Multimodal named entity recognition (MNER) is a vision-language task where the system is required to detect entity spans and corresponding entity types given a sentence-image pair. Existing methods capture text-image relations with various attention mechanisms that only obtain implicit alignments between entity types and image regions. To locate regions more accurately and better model cross-/within-modal relations, we propose a machine reading comprehension based framework for MNER, namely MRC-MNER. By utilizing queries in MRC, our framework can provide prior information about entity types and image regions. Specifically, we design two stages, Query-Guided Visual Grounding and Multi-Level Modal Interaction, to align fine-grained type-region information and simulate text-image/inner-text interactions respectively. For the former, we train a visual grounding model via transfer learning to extract region candidates that can be further integrated into the second stage to enhance token representations. For the latter, we design text-image and inner-text interaction modules along with three sub-tasks for MRC-MNER. To verify the effectiveness of our model, we conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MRC-MNER outperforms the current state-of-the-art models on Twitter2017, and yields competitive results on Twitter2015.
引用
收藏
页码:3549 / 3558
页数:10
相关论文
共 50 条
  • [41] Uncertainty query sampling strategies for active learning of named entity recognition task
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2021, 15 (01): : 99 - 114
  • [42] Generative named entity recognition framework for Chinese legal domain
    Mao, Xingliang
    Jiang, Jie
    Zeng, Yongzhe
    Peng, Yinan
    Zhang, Shichao
    Li, Fangfang
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [43] A Hybrid Deep Learning Framework for Bacterial Named Entity Recognition
    Li, Xusheng
    Wang, Xiaoyan
    Zhong, Ran
    Zhong, Duo
    He, Tingting
    Hu, Xiaohua
    Jiang, Xingpeng
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 428 - 433
  • [44] Research on Open Domain Named Entity Recognition Based on Chinese Query Logs
    Di, Yanxing
    WeiSong
    HanshiWang
    Liu, Lizhen
    PROCEEDINGS OF 2016 IEEE ADVANCED INFORMATION MANAGEMENT, COMMUNICATES, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IMCEC 2016), 2016, : 40 - 44
  • [45] Named Entity Recognition via Unified Information Extraction Framework
    Chen, Xinyue
    Zhang, Zhenguo
    Lu, Xinghua
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 308 - 313
  • [46] Hybrid Framework for Named Entity Recognition in Turkish Social Media
    Yilmaz, Selim F.
    Balaban, Ismail
    Tekin, Selim F.
    Kozat, Suleyman S.
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [47] Assisting Multimodal Named Entity Recognition by cross-modal auxiliary tasks
    Chen, Zhengjie
    Zhang, Yu
    Mi, Siya
    PATTERN RECOGNITION LETTERS, 2023, 175 : 52 - 58
  • [48] USAF: Multimodal Chinese named entity recognition using synthesized acoustic features
    Liu, Ye
    Huang, Shaobin
    Li, Rongsheng
    Yan, Naiyu
    Du, Zhijuan
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [49] In-context Learning for Few-shot Multimodal Named Entity Recognition
    Cai, Chenran
    Wang, Qianlong
    Liang, Bin
    Qin, Bing
    Yang, Min
    Wong, Kam-Fai
    Xu, Ruifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2969 - 2979
  • [50] Text-Image Scene Graph Fusion for Multimodal Named Entity Recognition
    Cheng J.
    Long K.
    Zhang S.
    Zhang T.
    Ma L.
    Cheng S.
    Guo Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2828 - 2839