Staged Multi-Strategy Framework With Open-Source Large Language Models for Natural Language to SQL Generation

被引：0

作者：

Liu, Chuanlong ^{[1
]}

Liao, Wei ^{[1
]}

Xu, Zhen ^{[2
]}

机构：

[1] Shanghai Univ Engn Sci, Sch Elect & Elect Engn, Shanghai 201620, Peoples R China

[2] Shanghai Univ Engn Sci, Sch Mech & Automot Engn, Shanghai 201620, Peoples R China

来源：

IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING | 2025年

关键词：

open-source large language models; pre-trained language models; natural language to sql; prompt learning;

D O I：

10.1002/tee.24268

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the field of natural language to SQL (NL2SQL), significant progress has been made with large pre-trained language models. However, these models still have deficiencies in terms of their ability to generalize, particularly in open-source Large Language Models (LLMs). Additionally, most research efforts tend to overlook the impact of key column information and data table content on the accuracy of queries during the SQL statement generation process. In this paper, we propose a staged, multi-strategy framework called Key Columns and Table Contents (KCTC). The framework is divided into two stages. Firstly, it uses fixed prompt content to extract SQL key column information from natural language questions, including selected columns and conditioned columns. It also formats the output of column information. Secondly, it combines variable prompt content to guide the model in generating SQL statements. It uses the content of the data table for constraints to reduce the impact of errors in condition values on SQL statements. We conducted experiments on the Chinese dataset TableQA using several open-source LLMs. The results demonstrate that our method significantly improved the execution accuracy of SQL statements, with an average increase of 60.29% and reaching up to 91.22% accuracy. This result validates the effectiveness of our approach. (c) 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

引用

页数：10

共 50 条

[31] Toponym resolution leveraging lightweight and open-source large language models and geo-knowledge
Hu, Xuke
Kersten, Jens
Klan, Friederike
Farzana, Sheikh Mastura
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2024,
[32] TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Yang, Ziqing
Cui, Yiming
Chen, Zhipeng
Che, Wanxiang
Liu, Ting
Wang, Shijin
Hu, Guoping
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): SYSTEM DEMONSTRATIONS, 2020, : 9 - 16
[33] KoRASA: Pipeline Optimization for Open-Source Korean Natural Language Understanding Framework Based on Deep Learning
Hwang, Myeong-Ha
Shin, Jikang
Seo, Hojin
Im, Jeong-Seon
Cho, Hee
MOBILE INFORMATION SYSTEMS, 2021, 2021
[34] An open-source fine-tuned large language model for radiological impression generation: a multi-reader performance study
Serapio, Adrian
Chaudhari, Gunvant
Savage, Cody
Lee, Yoo Jin
Vella, Maya
Sridhar, Shravan
Schroeder, Jamie Lee
Liu, Jonathan
Yala, Adam
Sohn, Jae Ho
BMC MEDICAL IMAGING, 2024, 24 (01):
[35] Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking
Zhuang, Shengyao
Liu, Bing
Koopman, Bevan
Zuccon, Guido
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8807 - 8817
[36] XNLRDF, an open source natural language resource description framework
Streiter, Oliver
Stuflesser, Mathias
PACLIC 19: The 19th Pacific Asia Conference on Language, Information and Computation, 2005, : 305 - 312
[37] TeenyTinyLlama: Open-source tiny language models trained in Brazilian Portuguese
Correa, Nicholas Kluge
Falk, Sophia
Fatimah, Shiza
Sen, Aniket
De Oliveira, Nythamar
MACHINE LEARNING WITH APPLICATIONS, 2024, 16
[38] OPEN-SOURCE LANGUAGE AI CHALLENGES BIG TECH'S MODELS
Gibney, Elizabeth
NATURE, 2022, 606 (7916) : 850 - 851
[39] Open-source language AI challenges big tech’s models
Elizabeth Gibney
Nature, 2022, 606 : 850 - 851
[40] Open-Source Large Language Models in Anesthesia Perioperative Medicine: ASA-Physical Status Evaluation
Rouholiman, Dara
Goodell, Alex J.
Fung, Ethan
Chandrasoma, Janak T.
Chu, Larry F.
ANESTHESIA AND ANALGESIA, 2024, 139 (06): : 2779 - 2781

← 1 2 3 4 5 →