MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引：0

作者：

Yang, Yizhe

Sun, Huashan

Li, Jiawei

Liu, Runheng

Li, Yinghao

Liu, Yuhang

Gao, Yang

Huang, Heyan ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China

来源：

AI OPEN | 2024年 / 5卷

基金：

中国国家自然科学基金;

关键词：

Large language model; Light weight; Bilingual;

D O I：

10.1016/j.aiopen.2024.08.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.

引用

页码：155 / 180

页数：26

共 50 条

[41] Survey on Vision-language Pre-training
Yin J.
Zhang Z.-D.
Gao Y.-H.
Yang Z.-W.
Li L.
Xiao M.
Sun Y.-Q.
Yan C.-G.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
[42] Pre-training Language Models for Comparative Reasoning
Yu, Mengxia
Zhang, Zhihan
Yu, Wenhao
Jiang, Meng
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12421 - 12433
[43] Sigmoid Loss for Language Image Pre-Training
Zhai, Xiaohua
Mustafa, Basil
Kolesnikov, Alexander
Beyer, Lucas
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11941 - 11952
[44] Grounded Language-Image Pre-training
Li, Liunian Harold
Zhang, Pengchuan
Zhang, Haotian
Yang, Jianwei
Li, Chunyuan
Zhong, Yiwu
Wang, Lijuan
Yuan, Lu
Zhang, Lei
Hwang, Jenq-Neng
Chang, Kai-Wei
Gao, Jianfeng
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
[45] VILA: On Pre-training for Visual Language Models
Lin, Ji
Yin, Hongxu
Ping, Wei
Molchanov, Pavlo
Shoeybi, Mohammad
Han, Song
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
[46] RELATION ENHANCED VISION LANGUAGE PRE-TRAINING
Lee, Ju-Hee
Kang, Je-Won
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2286 - 2290
[47] A New Pre-training Method for Training Deep Learning Models with Application to Spoken Language Understanding
Celikyilmaz, Asli
Sarikaya, Ruhi
Hakkani-Tur, Dilek
Liu, Xiaohu
Ramesh, Nikhil
Tur, Gokhan
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3255 - 3259
[48] Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
Dong, Haoyu
Cheng, Zhoujun
He, Xinyi
Zhou, Mengyu
Zhou, Anda
Zhou, Fan
Liu, Ao
Han, Shi
Zhang, Dongmei
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 5426 - 5435
[49] Dialogue-adaptive language model pre-training from quality estimation
Li, Junlong
Zhang, Zhuosheng
Zhao, Hai
NEUROCOMPUTING, 2023, 516 : 27 - 35
[50] ChouBERT: Pre-training French Language Model for Crowdsensing with Tweets in Phytosanitary Context
Jiang, Shufan
Angarita, Rafael
Cormier, Stephane
Orensanz, Julien
Rousseaux, Francis
RESEARCH CHALLENGES IN INFORMATION SCIENCE, 2022, 446 : 653 - 661

← 1 2 3 4 5 →