PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities

被引:0
|
作者
Sravanthi, Settaluri Lakshmi [1 ]
Doshi, Meet [1 ]
Kalyan, Tankala Pavan [1 ]
Murthy, Rudra [2 ]
Dabre, Raj [3 ]
Bhattacharyya, Pushpak [1 ]
机构
[1] Indian Inst Technol, CFILT, Mumbai, Maharashtra, India
[2] IBM Res, Armonk, NY USA
[3] NICT, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LLMs have demonstrated remarkable capability for understanding semantics, but their understanding of pragmatics is not well studied. To this end, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis. We curate high-quality test sets for each task, consisting of Multiple Choice Question Answers (MCQA). PUB includes a total of 28k data points, 6.1k are newly annotated. We evaluate nine models varying in the number of parameters and type of training. Our study reveals several key observations about the pragmatic capabilities of LLMs: 1. chat-fine-tuning strongly benefits smaller models, 2. large base models are competitive with their chat-fine-tuned counterparts, 3. there is a huge variance in performance across different pragmatics phenomena, and 4. a noticeable performance gap between human capabilities and model capabilities. We hope that PUB will enable comprehensive evaluation of LLM's pragmatic reasoning capabilities.
引用
收藏
页码:12075 / 12097
页数:23
相关论文
共 50 条
  • [1] Understanding pragmatics
    Buysse, Lieven
    INTERCULTURAL PRAGMATICS, 2017, 14 (01) : 125 - 130
  • [2] Understanding pragmatics
    Oswald, Steve
    JOURNAL OF PRAGMATICS, 2015, 80 : 44 - 47
  • [3] Understanding pragmatics
    不详
    FORUM FOR MODERN LANGUAGE STUDIES, 2001, 37 (03) : 359 - 359
  • [4] Understanding pragmatics.
    Ilie, C
    JOURNAL OF PRAGMATICS, 2001, 33 (02) : 323 - 331
  • [5] Understanding pragmatics.
    Jaffe, A
    LANGUAGE IN SOCIETY, 2001, 30 (01) : 104 - 107
  • [6] Understanding Pragmatics评介
    王茜
    科技信息, 2012, (35) : 719 - 719
  • [7] Assessing Second Language Pragmatics
    Willcox, Edit Ficzere
    LANGUAGE TESTING, 2016, 33 (04) : 609 - 612
  • [8] Assessing Second Language Pragmatics
    Felix-Brasdefer, J. Cesar
    JOURNAL OF PRAGMATICS, 2015, 84 : 83 - 85
  • [9] Pragmatics in understanding what is said
    Gibbs, RW
    Moise, JF
    COGNITION, 1997, 62 (01) : 51 - 74
  • [10] Are LLMs good at structured outputs? A benchmark for evaluating structured output capabilities in LLMs
    Liu, Yu
    Li, Duantengchuan
    Wang, Kaili
    Xiong, Zhuoran
    Shi, Fobo
    Wang, Jian
    Li, Bing
    Hang, Bo
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (05)