Staged Multi-Strategy Framework With Open-Source Large Language Models for Natural Language to SQL Generation

被引:0
|
作者
Liu, Chuanlong [1 ]
Liao, Wei [1 ]
Xu, Zhen [2 ]
机构
[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China
[2] Shanghai Univ Engn Sci, Sch Mech & Automot Engn, Shanghai 201620, Peoples R China
关键词
open-source large language models; pre-trained language models; natural language to sql; prompt learning;
D O I
10.1002/tee.24268
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the field of natural language to SQL (NL2SQL), significant progress has been made with large pre-trained language models. However, these models still have deficiencies in terms of their ability to generalize, particularly in open-source Large Language Models (LLMs). Additionally, most research efforts tend to overlook the impact of key column information and data table content on the accuracy of queries during the SQL statement generation process. In this paper, we propose a staged, multi-strategy framework called Key Columns and Table Contents (KCTC). The framework is divided into two stages. Firstly, it uses fixed prompt content to extract SQL key column information from natural language questions, including selected columns and conditioned columns. It also formats the output of column information. Secondly, it combines variable prompt content to guide the model in generating SQL statements. It uses the content of the data table for constraints to reduce the impact of errors in condition values on SQL statements. We conducted experiments on the Chinese dataset TableQA using several open-source LLMs. The results demonstrate that our method significantly improved the execution accuracy of SQL statements, with an average increase of 60.29% and reaching up to 91.22% accuracy. This result validates the effectiveness of our approach. (c) 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Evaluation of Open-Source Large Language Models for Metal-Organic Frameworks Research
    Bai, Xuefeng
    Xie, Yabo
    Zhang, Xin
    Han, Honggui
    Li, Jian-Rong
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (13) : 4958 - 4965
  • [22] Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain
    Ruiz, Maj Daniel C.
    Sell, John
    arXiv,
  • [23] The pureCMusic (pCM++) framework as open-source music language
    Tarabella, L
    COMPUTER MUSIC MODELING AND RETRIEVAL, 2006, 3902 : 34 - 44
  • [24] Iterative Refactoring of Real-World Open-Source Programs with Large Language Models
    Choi, Jinsu
    An, Gabin
    Yoo, Shin
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2024, 2024, 14767 : 49 - 55
  • [25] Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports
    Dorfner, Felix J.
    Juergensen, Liv
    Donle, Leonhard
    Al Mohamad, Fares
    Bodenmann, Tobias R.
    Cleveland, Mason C.
    Busch, Felix
    Adams, Lisa C.
    Sato, James
    Schultz, Thomas
    Kim, Albert E.
    Merkow, Jameson
    Bressem, Keno K.
    Bridge, Christopher P.
    RADIOLOGY, 2024, 313 (01)
  • [26] A set of open-source tools for Turkish natural language processing
    Coltekin, Cagri
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1079 - 1086
  • [27] Benchmarking Open-Source Large Language Models on Code-Switched Tagalog-English Retrieval Augmented Generation
    Adoptante, Aunhel John M.
    Castro, Jasper Adrian Dwight, V
    Medrana, Micholo Lanz B.
    Ocampo, Alyssa Patricia B.
    Peramo, Elmer C.
    Miranda, Melissa Ruth M.
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2025, 16 (02) : 233 - 242
  • [28] Benchmarking open-source large language models on Portuguese Revalida multiple-choice questions
    Severino, Joao Victor Bruneti
    de Paula, Pedro Angelo Basei
    Berger, Matheus Nespolo
    Loures, Filipe Silveira
    Todeschini, Solano Amadori
    Roeder, Eduardo Augusto
    Veiga, Maria Han
    Guedes, Murilo
    Marques, Gustavo Lenci
    BMJ HEALTH & CARE INFORMATICS, 2025, 32 (01)
  • [29] Analyzing Women's Contributions to Open-Source Software Projects based on Large Language Models
    Zhuang, Yuqian
    Zhang, Mingya
    Yang, Yiyuan
    Wang, Liang
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2363 - 2368
  • [30] Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology
    Ray, Partha Pratim
    CLINICAL NEURORADIOLOGY, 2024,