Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

被引：11

作者：

Sui, Yuan ^{[1
,4
]}

Zhou, Mengyu ^{[2
]}

Zhou, Mingjie ^{[3
,4
]}

Han, Shi ^{[2
]}

Zhang, Dongmei ^{[2
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] Microsoft, Beijing, Peoples R China

[3] Univ Hong Kong, Hong Kong, Peoples R China

[4] Microsoft Res Asia, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024 | 2024年

关键词：

large language models; semi-structured data; structural understanding capabilities; benchmark;

D O I：

10.1145/3616855.3635752

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, there is still much to learn about how well LLMs understand structured data, such as tables. Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data. In this paper, we try to understand this by designing a benchmark to evaluate the structural understanding capabilities (SUC) of LLMs. The benchmark we create includes seven tasks, each with its own unique challenges, e.g., cell lookup, row retrieval, and size detection. We perform a series of evaluations on GPT-3.5 and GPT-4. We find that performance varied depending on several input choices, including table input format, content order, role prompting, and partition marks. Drawing from the insights gained through the benchmark evaluations, we propose self-augmentation for effective structural prompting, such as critical value / range identification using internal knowledge of LLMs. When combined with carefully chosen input choices, these structural prompting methods lead to promising improvements in LLM performance on a variety of tabular tasks, e.g., TabFact(. 2.31%), HybridQA(. 2.13%), SQA(. 2.72%), Feverous(. 0.84%), and ToTTo(. 5.68%). We believe that our open source1 benchmark and proposed prompting methods can serve as a simple yet generic selection for future research.

引用

页码：645 / 654

页数：10

共 50 条

[1] A survey of table reasoning with large language models
Zhang, Xuanliang
Wang, Dingzirui
Dou, Longxu
Zhu, Qingfu
Che, Wanxiang
FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (09)
[2] Large Language Models are Complex Table Parsers
Zhao, Bowen
Ji, Changkai
Zhang, Yuejie
He, Wen
Wang, Yingwen
Wang, Qing
Feng, Rui
Zhang, Xiaobo
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 14786 - 14802
[3] Can large language models understand molecules?
Sadeghi, Shaghayegh
Bui, Alan
Forooghi, Ali
Lu, Jianguo
Ngom, Alioune
BMC BIOINFORMATICS, 2024, 25 (01):
[4] Large Language Models are few(1)-shot Table Reasoners
Chen, Wenhu
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1120 - 1130
[5] Cocoon: Semantic Table Profiling Using Large Language Models
Huang, Zezhou
Wu, Eugene
WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024, 2024,
[6] Are Large Language Models Table-based Fact-Checkers?
Zhang, Hanwen
Si, Qingyi
Fu, Peng
Lin, Zheng
Wang, Weiping
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 3086 - 3091
[7] LLM-PBE: Assessing Data Privacy in Large Language Models
Li, Qinbin
Hong, Junyuan
Xie, Chulin
Tan, Jeffrey
Xin, Rachel
Hou, Junyi
Yin, Xavier
Wang, Zhun
Hendrycks, Dan
Wang, Zhangyang
Li, Bo
He, Bingsheng
Song, Dawn
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 3201 - 3214
[8] LLM-Mod: Can Large Language Models Assist Content Moderation?
Kolla, Mahi
Salunkhe, Siddharth
Chandrasekharan, Eshwar
Saha, Koustuv
EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
[9] Using large language models for safety-related table summarization in clinical study reports
Landman, Rogier
Healey, Sean P.
Loprinzo, Vittorio
Kochendoerfer, Ulrike
Winnier, Angela Russell
Henstock, Peter, V
Lin, Wenyi
Chen, Aqiu
Rajendran, Arthi
Penshanwar, Sushant
Khan, Sheraz
Madhavan, Subha
JAMIA OPEN, 2024, 7 (02)
[10] Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts
Jang, Joel
Ye, Seongheyon
Seo, Minjoon
TRANSFER LEARNING FOR NATURAL LANGUAGE PROCESSING WORKSHOP, VOL 203, 2022, 203 : 52 - 62

← 1 2 3 4 5 →