MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引：0

作者：

Yang, Yizhe

Sun, Huashan

Li, Jiawei

Liu, Runheng

Li, Yinghao

Liu, Yuhang

Gao, Yang

Huang, Heyan ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China

来源：

AI OPEN | 2024年 / 5卷

基金：

中国国家自然科学基金;

关键词：

Large language model; Light weight; Bilingual;

D O I：

10.1016/j.aiopen.2024.08.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.

引用

页码：155 / 180

页数：26

共 50 条

[1] Lightweight Model Pre-Training via Language Guided Knowledge Distillation
Li, Mingsheng
Zhang, Lin
Zhu, Mingzhen
Huang, Zilong
Yu, Gang
Fan, Jiayuan
Chen, Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10720 - 10730
[2] Subset selection for domain adaptive pre-training of language model
Hwang, Junha
Lee, Seungdong
Kim, Haneul
Jeong, Young-Seob
SCIENTIFIC REPORTS, 2025, 15 (01):
[3] Pre-training and Evaluation of Numeracy-oriented Language Model
Feng, Fuli
Rui, Xilin
Wang, Wenjie
Cao, Yixin
Chua, Tat-Seng
ICAIF 2021: THE SECOND ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, 2021,
[4] Evaluation of pre-training large language models on leadership-class supercomputers
Yin, Junqi
Dash, Sajal
Gounley, John
Wang, Feiyi
Tourassi, Georgia
JOURNAL OF SUPERCOMPUTING, 2023, 79 (18): : 20747 - 20768
[5] Evaluation of pre-training large language models on leadership-class supercomputers
Junqi Yin
Sajal Dash
John Gounley
Feiyi Wang
Georgia Tourassi
The Journal of Supercomputing, 2023, 79 : 20747 - 20768
[6] QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search
Xie, Jian
Liang, Yidan
Liu, Jingping
Xiao, Yanghua
Wu, Baohua
Ni, Shenghua
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5282 - 5291
[7] An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training
Arumae, Kristjan
Sun, Qing
Bhatia, Parminder
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4854 - 4864
[8] Domain-Specific Language Model Pre-Training for Korean Tax Law Classification
Gu, Yeong Hyeon
Piao, Xianghua
Yin, Helin
Jin, Dong
Zheng, Ri
Yoo, Seong Joon
IEEE ACCESS, 2022, 10 : 46342 - 46353
[9] FlauBERT: Unsupervised Language Model Pre-training for French
Le, Hang
Vial, Loic
Frej, Jibril
Segonne, Vincent
Coavoux, Maximin
Lecouteux, Benjamin
Allauzen, Alexandre
Crabbe, Benoit
Besacier, Laurent
Schwab, Didier
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2479 - 2490
[10] Soft Language Clustering for Multilingual Model Pre-training
Zeng, Jiali
Jiang, Yufan
Yin, Yongjing
Jing, Yi
Meng, Fandong
Lin, Binghuai
Cao, Yunbo
Zhou, Jie
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7021 - 7035

← 1 2 3 4 5 →