Staged Multi-Strategy Framework With Open-Source Large Language Models for Natural Language to SQL Generation

被引：0

作者：

Liu, Chuanlong ^{[1
]}

Liao, Wei ^{[1
]}

Xu, Zhen ^{[2
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China

[2] Shanghai Univ Engn Sci, Sch Mech & Automot Engn, Shanghai 201620, Peoples R China

来源：

IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING | 2025年

关键词：

open-source large language models; pre-trained language models; natural language to sql; prompt learning;

D O I：

10.1002/tee.24268

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the field of natural language to SQL (NL2SQL), significant progress has been made with large pre-trained language models. However, these models still have deficiencies in terms of their ability to generalize, particularly in open-source Large Language Models (LLMs). Additionally, most research efforts tend to overlook the impact of key column information and data table content on the accuracy of queries during the SQL statement generation process. In this paper, we propose a staged, multi-strategy framework called Key Columns and Table Contents (KCTC). The framework is divided into two stages. Firstly, it uses fixed prompt content to extract SQL key column information from natural language questions, including selected columns and conditioned columns. It also formats the output of column information. Secondly, it combines variable prompt content to guide the model in generating SQL statements. It uses the content of the data table for constraints to reduce the impact of errors in condition values on SQL statements. We conducted experiments on the Chinese dataset TableQA using several open-source LLMs. The results demonstrate that our method significantly improved the execution accuracy of SQL statements, with an average increase of 60.29% and reaching up to 91.22% accuracy. This result validates the effectiveness of our approach. (c) 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

引用

页数：10

共 50 条

[21] Evaluation of Open-Source Large Language Models for Metal-Organic Frameworks Research
Bai, Xuefeng
Xie, Yabo
Zhang, Xin
Han, Honggui
Li, Jian-Rong
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (13) : 4958 - 4965
[22] Fine-Tuning and Evaluating Open-Source Large Language Models for the Army Domain
Ruiz, Maj Daniel C.
Sell, John
arXiv,
[23] The pureCMusic (pCM++) framework as open-source music language
Tarabella, L
COMPUTER MUSIC MODELING AND RETRIEVAL, 2006, 3902 : 34 - 44
[24] Iterative Refactoring of Real-World Open-Source Programs with Large Language Models
Choi, Jinsu
An, Gabin
Yoo, Shin
SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2024, 2024, 14767 : 49 - 55
[25] Comparing Commercial and Open-Source Large Language Models for Labeling Chest Radiograph Reports
Dorfner, Felix J.
Juergensen, Liv
Donle, Leonhard
Al Mohamad, Fares
Bodenmann, Tobias R.
Cleveland, Mason C.
Busch, Felix
Adams, Lisa C.
Sato, James
Schultz, Thomas
Kim, Albert E.
Merkow, Jameson
Bressem, Keno K.
Bridge, Christopher P.
RADIOLOGY, 2024, 313 (01)
[26] A set of open-source tools for Turkish natural language processing
Coltekin, Cagri
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1079 - 1086
[27] Benchmarking Open-Source Large Language Models on Code-Switched Tagalog-English Retrieval Augmented Generation
Adoptante, Aunhel John M.
Castro, Jasper Adrian Dwight, V
Medrana, Micholo Lanz B.
Ocampo, Alyssa Patricia B.
Peramo, Elmer C.
Miranda, Melissa Ruth M.
JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2025, 16 (02) : 233 - 242
[28] Benchmarking open-source large language models on Portuguese Revalida multiple-choice questions
Severino, Joao Victor Bruneti
de Paula, Pedro Angelo Basei
Berger, Matheus Nespolo
Loures, Filipe Silveira
Todeschini, Solano Amadori
Roeder, Eduardo Augusto
Veiga, Maria Han
Guedes, Murilo
Marques, Gustavo Lenci
BMJ HEALTH & CARE INFORMATICS, 2025, 32 (01)
[29] Analyzing Women's Contributions to Open-Source Software Projects based on Large Language Models
Zhuang, Yuqian
Zhang, Mingya
Yang, Yiyuan
Wang, Liang
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2363 - 2368
[30] Need of Fine-Tuned Radiology Aware Open-Source Large Language Models for Neuroradiology
Ray, Partha Pratim
CLINICAL NEURORADIOLOGY, 2024,

← 1 2 3 4 5 →