MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引:0
|
作者
Yang, Yizhe
Sun, Huashan
Li, Jiawei
Liu, Runheng
Li, Yinghao
Liu, Yuhang
Gao, Yang
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
来源
AI OPEN | 2024年 / 5卷
基金
中国国家自然科学基金;
关键词
Large language model; Light weight; Bilingual;
D O I
10.1016/j.aiopen.2024.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
引用
收藏
页码:155 / 180
页数:26
相关论文
共 50 条
  • [21] Too Large; Data Reduction for Vision-Language Pre-Training
    Wang, Alex Jinpeng
    Lin, Kevin Qinghong
    Zhang, David Junhao
    Lei, Stan Weixian
    Shou, Mike Zheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3124 - 3134
  • [22] Gradual Syntactic Label Replacement for Language Model Pre-Training
    Wang, Yile
    Zhang, Yue
    Li, Peng
    Liu, Yang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 486 - 496
  • [23] Analysing The Impact of Sequence Composition on Language Model Pre-Training
    Zhao, Yu
    Qu, Yuanbin
    Staniszewski, Konrad
    Tworkowski, Szymon
    Liu, Wei
    Milos, Piotr
    Wu, Yuxiang
    Minervini, Pasquale
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7897 - 7912
  • [24] Learning Better Masking for Better Language Model Pre-training
    Yang, Dongjie
    Zhang, Zhuosheng
    Zhao, Hai
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7255 - 7267
  • [25] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [26] K-DLM: A Domain-Adaptive Language Model Pre-Training Framework with Knowledge Graph
    Zou, Jiaxin
    Xie, Zuotong
    Chen, Junhua
    Hou, Jiawei
    Yan, Qiang
    Zheng, Hai-Tao
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IV, 2023, 14257 : 447 - 459
  • [27] From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain
    Bonfigli, Agnese
    Bacco, Luca
    Merone, Mario
    Dell'Orletta, Felice
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 157
  • [28] A Domain-adaptive Pre-training Approach for Language Bias Detection in News
    Krieger, Jan-David
    Spinde, Timo
    Ruas, Terry
    Kulshrestha, Juhi
    Gipp, Bela
    2022 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2022,
  • [29] AlephBERT: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level
    Seker, Amit
    Bandel, Elron
    Bareket, Dan
    Brusilovsky, Idan
    Greenfeld, Refael Shaked
    Tsarfaty, Reut
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 46 - 56
  • [30] Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
    Zhuge, Mingchen
    Gao, Dehong
    Fan, Deng-Ping
    Jin, Linbo
    Chen, Ben
    Zhou, Haoming
    Qiu, Minghui
    Shao, Ling
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12642 - 12652