MindLLM: Lightweight large language model pre-training, evaluation and domain application

被引:0
|
作者
Yang, Yizhe
Sun, Huashan
Li, Jiawei
Liu, Runheng
Li, Yinghao
Liu, Yuhang
Gao, Yang
Huang, Heyan [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
来源
AI OPEN | 2024年 / 5卷
基金
中国国家自然科学基金;
关键词
Large language model; Light weight; Bilingual;
D O I
10.1016/j.aiopen.2024.08.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.
引用
收藏
页码:155 / 180
页数:26
相关论文
共 50 条
  • [31] MGeo: Multi-Modal Geographic Language Model Pre-Training
    Ding, Ruixue
    Chen, Boli
    Xie, Pengjun
    Huang, Fei
    Li, Xin
    Zhang, Qiang
    Xu, Yao
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 185 - 194
  • [32] Pre-training Language Model as a Multi-perspective Course Learner
    Chen, Beiduo
    Huang, Shaohan
    Zhang, Zihan
    Guo, Wu
    Ling, Zhenhua
    Huang, Haizhen
    Wei, Furu
    Deng, Weiwei
    Zhang, Qi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 114 - 128
  • [33] Knowledge distilled pre-training model for vision-language-navigation
    Bo Huang
    Shuai Zhang
    Jitao Huang
    Yijun Yu
    Zhicai Shi
    Yujie Xiong
    Applied Intelligence, 2023, 53 : 5607 - 5619
  • [34] Efficient and Large Scale Pre-training Techniques for Japanese Natural Language Processing
    Kasagi, Akihiko
    Asaoka, Masahiro
    Tabuchi, Akihiro
    Oyama, Yosuke
    Honda, Takumi
    Sakai, Yasufumi
    Dang, Thang
    Tabaru, Tsuguchika
    2021 NINTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR 2021), 2021, : 108 - 113
  • [35] Continual Pre-Training of Python Language Model to mT5
    Kajiura, Teruno
    Souma, Nao
    Sato, Miyu
    Kuramitsu, Kimio
    Computer Software, 2023, 40 (04): : 10 - 21
  • [36] Knowledge distilled pre-training model for vision-language-navigation
    Huang, Bo
    Zhang, Shuai
    Huang, Jitao
    Yu, Yijun
    Shi, Zhicai
    Xiong, Yujie
    APPLIED INTELLIGENCE, 2023, 53 (05) : 5607 - 5619
  • [37] Dict-BERT: Enhancing Language Model Pre-training with Dictionary
    Yu, Wenhao
    Zhu, Chenguang
    Fang, Yuwei
    Yu, Donghan
    Wang, Shuohang
    Xu, Yichong
    Zeng, Michael
    Jiang, Meng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1907 - 1918
  • [38] SAS: Self-Augmentation Strategy for Language Model Pre-training
    Xu, Yifei
    Zhang, Jingqiao
    He, Ru
    Ge, Liangzhu
    Yang, Chao
    Yang, Cheng
    Wu, Ying Nian
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11586 - 11594
  • [39] MoDNA: Motif-Oriented Pre-training For DNA Language Model
    An, Weizhi
    Guo, Yuzhi
    Bian, Yatao
    Ma, Hehuan
    Yang, Jinyu
    Li, Chunyuan
    Huang, Junzhou
    13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [40] Research on the Training and Application Methods of a Lightweight Agricultural Domain-Specific Large Language Model Supporting Mandarin Chinese and Uyghur
    Pan, Kun
    Zhang, Xiaogang
    Chen, Liping
    APPLIED SCIENCES-BASEL, 2024, 14 (13):