ArithmeticGPT: empowering small-size large language models with advanced arithmetic skills

被引:0
|
作者
Liu, Zitao
Zheng, Ying
Yin, Zhibo
Chen, Jiahao
Liu, Tianqiao
Tian, Mi
Luo, Weiqi
机构
基金
国家重点研发计划;
关键词
Large language models; Problem-solving; Math reasoning; Curriculum learning;
D O I
10.1007/s10994-024-06681-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models (LLMs) have shown remarkable capabilities in understanding and generating language across a wide range of domains. However, their performance in advanced arithmetic calculation remains a significant challenge, especially for small-size LLMs. Therefore, in this paper, we propose ArithmeticGPT, a practical framework designed to enhance the advanced arithmetic skills for small-size LLMs. We carefully curate an arithmetic instruction dataset, ArithInstruct, that is able to teach the small-size LLMs to trigger a self-developed internal calculation API for precise computations without explicit instructions. The advanced arithmetic calculation results are seamlessly generated within natural language sentences. Furthermore, we empirically design a practical three-stage strategy for fine-tuning the small-size LLMs with ArithInstruct to enable the advanced arithmetic skills and keep the models' original abilities such as commonsense reasoning and question answering. We evaluate ArithmeticGPT on six public math related datasets with 17 state-of-the-art LLM baselines and experimental results demonstrate the superiority of our approach. To encourage reproducible research, we make our data and code publicly available at https://github.com/ai4ed/ArithmeticGPT.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] <bold>CataLM</bold>: empowering catalyst design through large language models
    Wang, Ludi
    Chen, Xueqing
    Du, Yi
    Zhou, Yuanchun
    Gao, Yang
    Cui, Wenjuan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025,
  • [22] Empowering Corner Case Detection in Autonomous Vehicles With Multimodal Large Language Models
    Liu, Tianqi
    Qin, Yanjun
    Zhang, Shanghang
    Tao, Xiaoming
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 51 - 55
  • [23] Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights
    Ballout, Mohamad
    Krumnack, Ulf
    Heidemann, Gunther
    Kuehnberger, Kai-Uwe
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 32 - 46
  • [24] Correct Imaging of Large-size Terahertz Beams by Small-size Camera without Energy Losses
    Kubarev, Vitaly V.
    2016 41ST INTERNATIONAL CONFERENCE ON INFRARED, MILLIMETER, AND TERAHERTZ WAVES (IRMMW-THZ), 2016,
  • [25] Conditionally Combining Robot Skills using Large Language Models
    Zentner, K. R.
    Julian, Ryan
    Ichter, Brian
    Sukhatme, Gaurav S.
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 14046 - 14053
  • [26] Large- and small-size advantages in sneaking behaviour in the dusky frillgoby Bathygobius fuscus
    Takegaki, Takeshi
    Kaneko, Takashi
    Matsumoto, Yukio
    NATURWISSENSCHAFTEN, 2012, 99 (04) : 285 - 289
  • [27] Small-size Pedestrian Detection in Large Scene Based on Fast R-CNN
    Wang, Shengke
    Yang, Na
    Duan, Lianghua
    Liu, Lu
    Dong, Junyu
    NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [28] Small-Size Large-Aperture Antenna Using Multilayered Spherical Dielectric Resonators
    Matsumuro, Takayuki
    Ishikawa, Yohei
    Shinohara, Naoki
    2013 7TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2013, : 3068 - 3072
  • [29] Large- and small-size advantages in sneaking behaviour in the dusky frillgoby Bathygobius fuscus
    Takeshi Takegaki
    Takashi Kaneko
    Yukio Matsumoto
    Naturwissenschaften, 2012, 99 : 285 - 289
  • [30] Designability of lattice small-size protein models: is it sufficient to use the compact ground states?
    Yesylevskyy, SO
    Demchenko, AP
    CHEMICAL PHYSICS LETTERS, 2004, 388 (4-6) : 348 - 352