Natural Questions: A Benchmark for Question Answering Research

被引:0
|
作者
Kwiatkowski T. [1 ]
Palomaki J. [1 ]
Redfield O. [1 ]
Collins M. [1 ,2 ]
Parikh A. [1 ]
Alberti C. [1 ]
Epstein D. [1 ]
Polosukhin I. [1 ]
Devlin J. [1 ]
Lee K. [1 ]
Toutanova K. [1 ]
Jones L. [1 ]
Kelcey M. [1 ]
Chang M.-W. [1 ]
Dai A.M. [1 ]
Uszkoreit J. [1 ]
Le Q. [1 ]
Petrov S. [1 ]
机构
[1] Google Research, United States
[2] Columbia University, United States
关键词
D O I
10.1162/tacl_a_00276
中图分类号
学科分类号
摘要
We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature. © 2019 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.
引用
收藏
页码:453 / 466
页数:13
相关论文
共 50 条
  • [41] Linguistic treatment of questions in Spanish for question classification in question answering systems.
    Olvera-Lobo, Maria-Dolores
    Robinson-Garcia, Nicolas
    PROFESIONAL DE LA INFORMACION, 2009, 18 (02): : 180 - 187
  • [42] Research on question retrieval method for community question answering
    Yong Sun
    Junfang Song
    Xiangyu Song
    Jiazheng Hou
    Multimedia Tools and Applications, 2023, 82 : 24309 - 24325
  • [44] SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions
    Sarrouti, Mourad
    Ouatik El Alaoui, Said
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 102
  • [45] Research and reviews in question answering system
    Dwivedi, Sanjay K.
    Singh, Vaishali
    FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE: MODELING TECHNIQUES AND APPLICATIONS (CIMTA) 2013, 2013, 10 : 417 - 424
  • [46] Event Extraction by Answering (Almost) Natural Questions
    Du, Xinya
    Cardie, Claire
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 671 - 683
  • [47] Precisiating Natural Language for a question answering system
    Thint, Marcus
    Beg, M. M. Sufyan
    Qin, Zengehang
    WMSCI 2007: 11TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS, 2007, : 165 - +
  • [48] ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
    Masry, Ahmed
    Long, Do Xuan
    Tan, Jia Qing
    Joty, Shafiq
    Hogue, Enamul
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2263 - 2279
  • [49] Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 460 - 474
  • [50] Natural Language Question Answering in Open Domains
    Tufis, Dan
    COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2011, 19 (02) : 146 - 164