Ditch the Gold Standard: Re-evaluating Conversational Question Answering

被引:0
|
作者
Li, Huihan [1 ]
Gao, Tianyu [1 ]
Goenka, Manan [1 ]
Chen, Danqi [1 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conversational question answering aims to provide natural-language answers to users in information-seeking conversations. Existing conversational QA benchmarks compare models with pre-collected human-human conversations, using ground-truth answers provided in conversational history. It remains unclear whether we can rely on this static evaluation for model development and whether current systems can well generalize to real-world human-machine conversations. In this work, we conduct the first large-scale human evaluation of state-of-the-art conversational QA systems, where human evaluators converse with models and judge the correctness of their answers. We find that the distribution of human-machine conversations differs drastically from that of human-human conversations, and there is a disagreement between human and gold-history evaluation in terms of model ranking. We further investigate how to improve automatic evaluations, and propose a question rewriting mechanism based on predicted history, which better correlates with human judgments. Finally, we analyze the impact of various modeling strategies and discuss future directions towards building better conversational question answering systems.(1)
引用
收藏
页码:8074 / 8085
页数:12
相关论文
共 50 条
  • [31] The Standard of Care in Type 2 Diabetes: Re-evaluating the Treatment Paradigm
    Mohan, Viswanathan
    Cooper, Mark E.
    Matthews, David R.
    Khunti, Kamlesh
    DIABETES THERAPY, 2019, 10 (Suppl 1) : S1 - S13
  • [32] For all the Wrong Reasons? Re-evaluating Truman, Domestic Influences, and the Palestine Question
    McBride, David
    DIGEST OF MIDDLE EAST STUDIES, 2005, 14 (02) : 27 - 49
  • [33] Integrating Question Rewriting in Conversational Question Answering: A Reinforcement Learning Approach
    Ishii, Etsuko
    Wilie, Bryan
    Xu, Yan
    Cahyawijaya, Samuel
    Fung, Pascale
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 55 - 66
  • [34] Prompt Guided Copy Mechanism for Conversational Question Answering
    Zhang, Yong
    Li, Zhitao
    Wang, Jianzong
    Gao, Yiming
    Cheng, Ning
    Yu, Fengying
    Xiao, Jing
    INTERSPEECH 2023, 2023, : 3422 - 3426
  • [35] Knowledge Informed Semantic Parsing for Conversational Question Answering
    Thirukovalluru, Raghuveer
    Sridhar, Mukund
    Dung Thai
    Chanumolu, Shruti
    Monath, Nicholas
    Ananthakrishnan, Shankar
    McCallum, Andrew
    REPL4NLP 2021: PROCEEDINGS OF THE 6TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2021, : 231 - 240
  • [36] Towards a more Robust Evaluation for Conversational Question Answering
    Siblini, Wissam
    Sayil, Baris
    Kessaci, Yacine
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 1028 - 1034
  • [37] BERT with History Answer Embedding for Conversational Question Answering
    Qu, Chen
    Yang, Liu
    Qiu, Minghui
    Croft, W. Bruce
    Zhang, Yongfeng
    Iyyer, Mohit
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1133 - 1136
  • [38] Explainable Conversational Question Answering over Heterogeneous Sources
    Christmann, Philipp
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3499 - 3499
  • [39] An Empirical Study of Content Understanding in Conversational Question Answering
    Chiang, Ting-Rui
    Ye, Hao-Tong
    Chen, Yun-Nung
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7578 - 7585
  • [40] Re-evaluating prokaryotic species
    Gevers, D
    Cohan, FM
    Lawrence, JG
    Spratt, BG
    Coenye, T
    Feil, EJ
    Stackebrandt, E
    Van de Peer, Y
    Vandamme, P
    Thompson, FL
    Swings, J
    NATURE REVIEWS MICROBIOLOGY, 2005, 3 (09) : 733 - 739