Context Compression and Extraction: Efficiency Inference of Large Language Models

被引：0

作者：

Zhou, Junyao ^{[1
]}

Du, Ruiqing ^{[1
]}

Tan, Yushan ^{[2
]}

Yang, Jintao ^{[2
]}

Yang, Zonghao ^{[2
]}

Luo, Wei ^{[2
]}

Luo, Zhunchen ^{[2
]}

Zhou, Xian ^{[2
]}

Hu, Wenpeng ^{[2
]}

机构：

[1] Hebei Univ Engn, Sch Informat & Elect Engn, Handan 056000, Peoples R China

[2] Acad Mil Sci Peoples Liberat Army, Beijing 1000000, Peoples R China

来源：

ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT I, ICIC 2024 | 2024年 / 14875卷

基金：

中国国家自然科学基金;

关键词：

self-information; mutual-information; context compression; large language model;

D O I：

10.1007/978-981-97-5663-6_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models have shown great capability in dealing with long contexts. However, when applied to question-and-answer response tasks, excessively long contexts unavoidably contain redundant information, which could potentially lead to a loss of significant details. Therefore it is a challenge to retain the information related to the user's query intent in long contexts. To address this problem, our study proposes a novel Context Compression and Extraction (CCE) technique, which takes the impact of the user query into account. CCE computes the mutual information between the query and its context, integrating this with self-information to preserve query-relevant information in the compressed context. We have validated our approach across diverse datasets that require integrated context processing capabilities, such as the arXiv paper dataset and news article dataset. Our methodology exhibits efficacy in various tasks, including summarization, question-answering, and the reconstruction of original contexts. Experimental results validate the superior performance of our method compared to a strong baseline across several evaluation metrics, significantly enhancing the quality of text generated in downstream tasks.

引用

页码：221 / 232

页数：12

共 50 条

[41] Generative Inference of Large Language Models in Edge Computing: An Energy Efficient Approach
Yuan, Xingyu
Li, He
Ota, Kaoru
Dong, Mianxiong
20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 244 - 249
[42] Tabi: An Efficient Multi-Level Inference System for Large Language Models
Wang, Yiding
Chen, Kai
Tan, Haisheng
Guo, Kun
PROCEEDINGS OF THE EIGHTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, EUROSYS 2023, 2023, : 233 - 248
[43] Beyond the Cloud: Edge Inference for Generative Large Language Models in Wireless Networks
Zhang, Xinyuan
Nie, Jiangtian
Huang, Yudong
Xie, Gaochang
Xiong, Zehui
Liu, Jiang
Niyato, Dusit
Shen, Xuemin
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2025, 24 (01) : 643 - 658
[44] An efficient quantized GEMV implementation for large language models inference with matrix core
Zhang, Yu
Lu, Lu
Zhao, Rong
Guo, Yijie
Yang, Zhanyu
JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
[45] Distributed Inference and Fine-tuning of Large Language Models Over The Internet
Borzunov, Alexander
Ryabinin, Max
Chumachenko, Artem
Baranchuk, Dmitry
Dettmers, Tim
Belkada, Younes
Samygin, Pavel
Raffel, Colin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[46] Assessing Large Language Models for Oncology Data Inference From Radiology Reports
Chen, Li-Ching
Zack, Travis
Demirci, Arda
Sushil, Madhumita
Miao, Brenda
Kasap, Corynn
Butte, Atul
Collisson, Eric A.
Hong, Julian C.
JCO CLINICAL CANCER INFORMATICS, 2024, 8
[47] Context-Aware Abbreviation Expansion Using Large Language Models
Cai, Shanqing
Venugopalan, Subhashini
Tomanek, Katrin
Narayanan, Ajit
Morris, Meredith Ringel
Brenner, Michael P.
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1261 - 1275
[48] Are Emergent Abilities in Large Language Models just In-Context Learning?
Lu, Sheng
Bigoulaeva, Irina
Sachdeva, Rachneet
Madabushi, Harish Tayyar
Gurevych, Iryna
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5098 - 5139
[49] Towards a benchmark dataset for large language models in the context of process automation
Tizaoui, Tejennour
Tan, Ruomu
DIGITAL CHEMICAL ENGINEERING, 2024, 13
[50] Visual In-Context Learning for Large Vision-Language Models
Zhou, Yucheng
Le, Xiang
Wang, Qianning
Shen, Jianbing
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902

← 1 2 3 4 5 →