A Comprehensive Evaluation of Neural SPARQL Query Generation From Natural Language Questions

被引：0

作者：

Diallo, Papa Abdou Karim Karou ^{[1
]}

Reyd, Samuel ^{[2
]}

Zouaq, Amal ^{[1
]}

机构：

[1] Polytech Montreal, Dept Comp Engn & Software Engn, LAMA WeST Lab, Montreal, PQ H3T 1J4, Canada

[2] Telecom Paris, F-91120 Palaiseau, France

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Annotations; Large language models; Computer architecture; Transformers; Vocabulary; Query processing; Knowledge based systems; Encoding; SPARQL query generation; knowledge base; copy mechanism; non pre-trained; pre-trained encoders-decoders;

D O I：

10.1109/ACCESS.2024.3453215

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, the field of neural machine translation (NMT) for SPARQL query generation has witnessed significant growth. Incorporating the copy mechanism with traditional encoder-decoder architectures and using pre-trained encoder-decoder and large language models have set new performance benchmarks. This paper presents various experiments that replicate and expand upon recent NMT-based SPARQL generation studies, comparing pre-trained language models (PLMs), non-pre-trained language models (NPLMs), and large language models (LLMs), highlighting the impact of question annotation and the copy mechanism and testing various fine-tuning methods using LLMs. In particular, we provide a systematic error analysis of the models and test their generalization ability. Our study demonstrates that the copy mechanism yields significant performance enhancements for most PLMs and NPLMs. Annotating the data is pivotal to generating correct URIs, with the "tag-within" strategy emerging as the most effective approach. Additionally, our findings reveal that the primary source of errors stems from incorrect URIs in SPARQL queries that are sometimes replaced with hallucinated URIs when using base models. This does not happen using the copy mechanism, but it sometimes leads to selecting wrong URIs among candidates. Finally, the performance of the tested LLMs fell short of achieving the desired outcomes.

引用

页码：125057 / 125078

页数：22

共 50 条

[41] Optimizing Interpretation Generation in Natural Language Query Answering for Real Time End Users
Sen, Jaydeep
Saha, Diptikalyan
Mittal, Ashish
Sankaranarayanan, Karthik
CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 341 - 349
[42] Improving Efficiency of Natural-Language Text Generation for Automatic Pedagogical Questions
Gomazkova, Yulia
Sychev, Oleg
Gumerov, Marat
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024 WORKSHOPS, PT II, 2024, 14816 : 37 - 50
[43] ASPECTS OF THE AUTOMATIC-GENERATION OF SQL STATEMENTS IN A NATURAL-LANGUAGE QUERY INTERFACE
OTT, N
INFORMATION SYSTEMS, 1992, 17 (02) : 147 - 159
[44] Towards Enhancing Database Education: Natural Language Generation Meets Query Execution Plans
Wang, Weiguo
Bhowmick, Sourav S.
Li, Hui
Joty, Shafiq
Liu, Siyuan
Chen, Peng
SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1933 - 1945
[45] A Repository of Data and Evaluation Resources for Natural Language Generation
Belz, Anja
Gatt, Albert
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 4027 - 4032
[46] Unifying Human and Statistical Evaluation for Natural Language Generation
Hashimoto, Tatsunori B.
Zhang, Hugh
Liang, Percy
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1689 - 1701
[47] The Glass Ceiling of Automatic Evaluation in Natural Language Generation
Colombo, Pierre
Peyrard, Maxime
Noiry, Nathan
West, Robert
Piantanida, Pablo
13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 178 - 183
[48] IAEval: A Comprehensive Evaluation of Instance Attribution on Natural Language Understanding
Gni, Peijian
Shen, Yaozong
Wang, Lijie
Wang, Quan
Wu, Hua
Mao, Zhendong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11966 - 11977
[49] Natural Language Processing and Psychosis: On the Need for Comprehensive Psychometric Evaluation
Cohen, Alex S.
Rodriguez, Zachary
Warren, Kiara K.
Cowan, Tovah
Masucci, Michael D.
Granrud, Ole Edvard
Holmlund, Terje B.
Chandler, Chelsea
Foltz, Peter W.
Strauss, Gregory P.
SCHIZOPHRENIA BULLETIN, 2022, 48 (05) : 939 - 948
[50] ISQNL: Interpretable SQL Query Synthesizer from Natural Language Input
Phal, Shubham Milind
Yatish, H. R.
Hukkeri, Tanmay Sanjay
Natarajan, Abhiram
Gonchigar, Prathika
Deepamala, N.
PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION SCIENCE AND SYSTEM, AISS 2019, 2019,

← 1 2 3 4 5 →