BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

被引:0
|
作者
Kazemi, Mehran [1 ]
Yuan, Quan [1 ]
Bhatia, Deepti [1 ]
Kim, Najoung [1 ]
Xu, Xin [1 ]
Imbrasaite, Vaiva [1 ]
Ramachandran, Deepak [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning
    Lindstrom, Adam Dahlgren
    Abraham, Savitha Sam
    NEURAL-SYMBOLIC LEARNING AND REASONING, NESY 2022, 2022, : 155 - 170
  • [32] KRIERS CONTRADICTORY LANGUAGE
    ROTTEN, J
    ARCHITECTURAL DESIGN, 1977, 47 (9-10) : 588 - 588
  • [33] BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
    Consoli, Bernardo S.
    dos Santos, Henrique D. P.
    Ulbrich, Ana Helena D. P. S.
    Vieira, Renata
    Bordini, Rafael H.
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 5609 - 5616
  • [34] LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations
    Tony, Catherine
    Mutas, Markus
    Ferreyra, Nicolas E. Diaz
    Scandariato, Riccardo
    2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 588 - 592
  • [35] Natural language in information retrieval
    Dura, E
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 537 - 540
  • [36] Dataset of Natural Language Queries for E-Commerce
    Papenmeier, Andrea
    Kern, Dagmar
    Hienert, Daniel
    Sliwa, Alfred
    Aker, Ahmet
    Fuhr, Norbert
    arXiv, 2023,
  • [37] A natural language fMRI dataset for voxelwise encoding models
    Amanda LeBel
    Lauren Wagner
    Shailee Jain
    Aneesh Adhikari-Desai
    Bhavin Gupta
    Allyson Morgenthal
    Jerry Tang
    Lixiang Xu
    Alexander G. Huth
    Scientific Data, 10
  • [38] Building a Vietnamese Dataset for Natural Language Inference Models
    Nguyen C.T.
    Nguyen D.T.
    SN Computer Science, 3 (5)
  • [39] A hybrid approach to Natural Language Inference for the SICK dataset
    Souza, Rodrigo
    Lopes, Marcos
    COMPUTER SPEECH AND LANGUAGE, 2025, 90
  • [40] Natural language information retrieval
    Corston-Oliver, S
    COMPUTATIONAL LINGUISTICS, 2000, 26 (03) : 460 - 462