PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities

被引：0

作者：

Sravanthi, Settaluri Lakshmi ^{[1
]}

Doshi, Meet ^{[1
]}

Kalyan, Tankala Pavan ^{[1
]}

Murthy, Rudra ^{[2
]}

Dabre, Raj ^{[3
]}

Bhattacharyya, Pushpak ^{[1
]}

机构：

[1] Indian Inst Technol, CFILT, Mumbai, Maharashtra, India

[2] IBM Res, Armonk, NY USA

[3] NICT, Tokyo, Japan

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

LLMs have demonstrated remarkable capability for understanding semantics, but their understanding of pragmatics is not well studied. To this end, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis. We curate high-quality test sets for each task, consisting of Multiple Choice Question Answers (MCQA). PUB includes a total of 28k data points, 6.1k are newly annotated. We evaluate nine models varying in the number of parameters and type of training. Our study reveals several key observations about the pragmatic capabilities of LLMs: 1. chat-fine-tuning strongly benefits smaller models, 2. large base models are competitive with their chat-fine-tuned counterparts, 3. there is a huge variance in performance across different pragmatics phenomena, and 4. a noticeable performance gap between human capabilities and model capabilities. We hope that PUB will enable comprehensive evaluation of LLM's pragmatic reasoning capabilities.

引用

页码：12075 / 12097

页数：23

共 50 条

[1] Understanding pragmatics
Buysse, Lieven
INTERCULTURAL PRAGMATICS, 2017, 14 (01) : 125 - 130
[2] Understanding pragmatics
Oswald, Steve
JOURNAL OF PRAGMATICS, 2015, 80 : 44 - 47
[3] Understanding pragmatics
不详
FORUM FOR MODERN LANGUAGE STUDIES, 2001, 37 (03) : 359 - 359
[4] Understanding pragmatics.
Ilie, C
JOURNAL OF PRAGMATICS, 2001, 33 (02) : 323 - 331
[5] Understanding pragmatics.
Jaffe, A
LANGUAGE IN SOCIETY, 2001, 30 (01) : 104 - 107
[6] Understanding Pragmatics评介
王茜
科技信息, 2012, (35) : 719 - 719
[7] Assessing Second Language Pragmatics
Willcox, Edit Ficzere
LANGUAGE TESTING, 2016, 33 (04) : 609 - 612
[8] Assessing Second Language Pragmatics
Felix-Brasdefer, J. Cesar
JOURNAL OF PRAGMATICS, 2015, 84 : 83 - 85
[9] Pragmatics in understanding what is said
Gibbs, RW
Moise, JF
COGNITION, 1997, 62 (01) : 51 - 74
[10] Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs
Liu, Yu
Li, Duantengchuan
Wang, Kaili
Xiong, Zhuoran
Shi, Fobo
Wang, Jian
Li, Bing
Hang, Bo
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)

← 1 2 3 4 5 →