Enhancing Visual Information Extraction with Large Language Models Through Layout-Aware Instruction Tuning

被引:0
|
作者
Li, Teng [1 ]
Wang, Jiapeng [1 ]
Jin, Lianwen [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual Information Extraction; Large Language Model; Instruction Tuning;
D O I
10.1007/978-981-97-8511-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, leveraging large language models (LLMs) for visually-rich document information extraction has made significant progress. Previous studies have simplified the task of visual information extraction into a document visual question answering task. This task involves a question-answer session that yields a single entity result at a time, serving as a means of validating the document understanding capabilities of large language models (LLMs). However, these methods encounter significant challenges in computational efficiency and cost when addressing the document digitization requirements for extracting multiple entities from a single document. This scenario is common in practical applications of visual information extraction. This paper builds upon large language model and incorporates document layout information through a document layout modeling branch. We also design a layout-aware and task-specific instruction set. To further enhance the model's proficiency in learning document layout information, we initially augment the tokenizer's vocabulary. Subsequently, the entire model undergoes fine-tuning to ensure improved adaptability to the expanded vocabulary and effective extraction of document layout features. By harnessing the exceptional language comprehension capabilities of LLMs, our model is capable of executing comprehensive entity extraction for an entire document in a single pass. Benefiting from the characteristics of generative large language models, we can accomplish multiple downstream tasks of visual information extraction using an individual model. Our experimental results demonstrate consistent improvement over the baseline model across a range of document visual information extraction tasks.
引用
收藏
页码:276 / 289
页数:14
相关论文
共 50 条
  • [11] GraphGPT: Graph Instruction Tuning for Large Language Models
    Tang, Jiabin
    Yang, Yuhao
    Wei, Wei
    Shi, Lei
    Su, Lixin
    Cheng, Suqi
    Yin, Dawei
    Huang, Chao
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 491 - 500
  • [12] Phased Instruction Fine-Tuning for Large Language Models
    Pang, Wei
    Zhou, Chuan
    Zhou, Xiao-Hua
    Wang, Xiaojie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5735 - 5748
  • [13] Enhancing Relation Extraction Through Augmented Data: Large Language Models Unleashed
    Ali, Manzoor
    Nisar, Muhammad Sohail
    Saleem, Muhammad
    Moussallem, Diego
    Ngomo, Axel-Cyrille Ngonga
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 68 - 78
  • [14] BioInstruct: instruction tuning of large language models for biomedical natural language processing
    Tran, Hieu
    Yang, Zhichao
    Yao, Zonghai
    Yu, Hong
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 1821 - 1832
  • [15] Demystifying Instruction Mixing for Fine-tuning Large Language Models
    Wang, Renxi
    Li, Haonan
    Wu, Minghao
    Wang, Yuxia
    Han, Xudong
    Zhang, Chiyu
    Baldwin, Timothy
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 86 - 93
  • [16] Tuna: Instruction Tuning using Feedback from Large Language Models
    Li, Haoran
    Liu, Yiran
    Zhang, Xingxing
    Lu, Wei
    Wei, Furu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15146 - 15163
  • [17] An Empirical Study of Instruction-tuning Large Language Models in Chinese
    Si, Qingyi
    Wang, Tong
    Lin, Zheng
    Zhang, Xu
    Cao, Yanan
    Wang, Weiping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4086 - 4107
  • [18] Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
    Luo, Gen
    Zhou, Yiyi
    Ren, Tianhe
    Chen, Shengxin
    Sun, Xiaoshuai
    Ji, Rongrong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [19] Large language models for generative information extraction: a survey
    Xu, Derong
    Chen, Wei
    Peng, Wenjun
    Zhang, Chao
    Xu, Tong
    Zhao, Xiangyu
    Wu, Xian
    Zheng, Yefeng
    Wang, Yang
    Chen, Enhong
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [20] Extraction of Subjective Information from Large Language Models
    Kobayashi, Atsuya
    Yamaguchi, Saneyasu
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1612 - 1617