CAG: A Consistency-Adaptive Text-Image Alignment Generation for Joint Multimodal Entity-Relation Extraction

被引:0
|
作者
Yang, Xinjie [1 ]
Gong, Xiaocheng [1 ]
Tang, Binghao [1 ]
Lei, Yang [1 ]
Deng, Yayue [1 ]
Ouyang, Huan [1 ]
Zhao, Gang [1 ]
Luo, Lei [1 ]
Feng, Yunling [1 ]
Duan, Bin [1 ]
Li, Si [1 ]
Xu, Yajing [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
Multimodal Alignment; Contrastive Learning; Joint Multimodal Entity-relation Extraction; Instruction tuning; RECOGNITION;
D O I
10.1145/3627673.3679883
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Joint Multimodal Entity-Relation Extraction (JMERE) aims to extract entity-relationship triples in texts from given image-text pairs. As a joint multimodal information extraction task, it has attracted increasing research interest. Previous works of JMERE typically utilize graph networks to align textual entities and visual objects and achieve promising performance. However, these methods do not pay attention to the inconsistency between text and image and the straight alignment could limit the performance of JMERE models. In this paper, we propose a Consistency-adaptive text-image Alignment Generation (CAG) framework for various text-image consistency scenarios. Specifically, we propose a Consistency Factor (CF) to measure the consistency between images and texts. We also design consistency-adaptive contrastive learning based on CF, which can reduce the impact of inconsistent visual and textual information. Additionally, we adopt JMERE-specifical instruction tuning for better entity-relationship triplet generation. Experimental results on the JMERE dataset demonstrate that our proposed CAG is effective and achieves state-of-the-art performance.
引用
收藏
页码:4183 / 4187
页数:5
相关论文
共 6 条
  • [1] A Fine-Grained Network for Joint Multimodal Entity-Relation Extraction
    Yuan, Li
    Cai, Yi
    Xu, Jingyu
    Li, Qing
    Wang, Tao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (01) : 1 - 14
  • [2] A knowledge-enhanced network for joint multimodal entity-relation extraction
    Huang, Shubin
    Cai, Yi
    Yuan, Li
    Wang, Jiexin
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [3] Joint Multimodal Entity-Relation Extraction Based on Edge-Enhanced Graph Alignment Network andWord-Pair Relation Tagging
    Yuan, Li
    Cai, Yi
    Wang, Jin
    Li, Qing
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11051 - 11059
  • [4] Joint multimodal entity-relation extraction based on temporal enhancement and similarity-gated attention
    Wang, Guoxiang
    Liu, Jin
    Xie, Jialong
    Zhu, Zhenwei
    Zhou, Fengyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [5] An Entity-Relation Joint Extraction Method Based on Two Independent Sub-Modules From Unstructured Text
    Liu, Su
    Lyu, Wenqi
    Ma, Xiao
    Ge, Jike
    IEEE ACCESS, 2023, 11 : 122154 - 122163
  • [6] The more quality information the better: Hierarchical generation of multi-evidence alignment and fusion model for multimodal entity and relation extraction
    He, Xinyu
    Li, Shixin
    Zhang, Yuning
    Li, Binhe
    Xu, Sifan
    Zhou, Yuqing
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)