Context Compression and Extraction: Efficiency Inference of Large Language Models

被引:0
|
作者
Zhou, Junyao [1 ]
Du, Ruiqing [1 ]
Tan, Yushan [2 ]
Yang, Jintao [2 ]
Yang, Zonghao [2 ]
Luo, Wei [2 ]
Luo, Zhunchen [2 ]
Zhou, Xian [2 ]
Hu, Wenpeng [2 ]
机构
[1] Hebei Univ Engn, Sch Informat & Elect Engn, Handan 056000, Peoples R China
[2] Acad Mil Sci Peoples Liberat Army, Beijing 1000000, Peoples R China
基金
中国国家自然科学基金;
关键词
self-information; mutual-information; context compression; large language model;
D O I
10.1007/978-981-97-5663-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models have shown great capability in dealing with long contexts. However, when applied to question-and-answer response tasks, excessively long contexts unavoidably contain redundant information, which could potentially lead to a loss of significant details. Therefore it is a challenge to retain the information related to the user's query intent in long contexts. To address this problem, our study proposes a novel Context Compression and Extraction (CCE) technique, which takes the impact of the user query into account. CCE computes the mutual information between the query and its context, integrating this with self-information to preserve query-relevant information in the compressed context. We have validated our approach across diverse datasets that require integrated context processing capabilities, such as the arXiv paper dataset and news article dataset. Our methodology exhibits efficacy in various tasks, including summarization, question-answering, and the reconstruction of original contexts. Experimental results validate the superior performance of our method compared to a strong baseline across several evaluation metrics, significantly enhancing the quality of text generated in downstream tasks.
引用
收藏
页码:221 / 232
页数:12
相关论文
共 50 条
  • [41] Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach
    Yuan, Xingyu
    Li, He
    Ota, Kaoru
    Dong, Mianxiong
    20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 244 - 249
  • [42] Tabi: An Efficient Multi-Level Inference System for Large Language Models
    Wang, Yiding
    Chen, Kai
    Tan, Haisheng
    Guo, Kun
    PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 233 - 248
  • [43] Beyond the Cloud: Edge Inference for Generative Large Language Models in Wireless Networks
    Zhang, Xinyuan
    Nie, Jiangtian
    Huang, Yudong
    Xie, Gaochang
    Xiong, Zehui
    Liu, Jiang
    Niyato, Dusit
    Shen, Xuemin
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (01) : 643 - 658
  • [44] An efficient quantized GEMV implementation for large language models inference with matrix core
    Zhang, Yu
    Lu, Lu
    Zhao, Rong
    Guo, Yijie
    Yang, Zhanyu
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
  • [45] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
    Borzunov, Alexander
    Ryabinin, Max
    Chumachenko, Artem
    Baranchuk, Dmitry
    Dettmers, Tim
    Belkada, Younes
    Samygin, Pavel
    Raffel, Colin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Assessing Large Language Models for Oncology Data Inference From Radiology Reports
    Chen, Li-Ching
    Zack, Travis
    Demirci, Arda
    Sushil, Madhumita
    Miao, Brenda
    Kasap, Corynn
    Butte, Atul
    Collisson, Eric A.
    Hong, Julian C.
    JCO CLINICAL CANCER INFORMATICS, 2024, 8
  • [47] Context-Aware Abbreviation Expansion Using Large Language Models
    Cai, Shanqing
    Venugopalan, Subhashini
    Tomanek, Katrin
    Narayanan, Ajit
    Morris, Meredith Ringel
    Brenner, Michael P.
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1261 - 1275
  • [48] Are Emergent Abilities in Large Language Models just In-Context Learning?
    Lu, Sheng
    Bigoulaeva, Irina
    Sachdeva, Rachneet
    Madabushi, Harish Tayyar
    Gurevych, Iryna
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5098 - 5139
  • [49] Towards a benchmark dataset for large language models in the context of process automation
    Tizaoui, Tejennour
    Tan, Ruomu
    DIGITAL CHEMICAL ENGINEERING, 2024, 13
  • [50] Visual In-Context Learning for Large Vision-Language Models
    Zhou, Yucheng
    Le, Xiang
    Wang, Qianning
    Shen, Jianbing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902