Multiple Testing in Statistical Analysis of Systems-Based Information Retrieval Experiments

被引:61
|
作者
Carterette, Benjamin A. [1 ]
机构
[1] Univ Delaware, Dept Comp & Informat Syst, Newark, DE 19716 USA
关键词
Experimentation; Measurement; Theory; Information retrieval; effectiveness evaluation; test collections; experimental design; statistical analysis; INFERENCE;
D O I
10.1145/2094072.2094076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-quality reusable test collections and formal statistical hypothesis testing together support a rigorous experimental environment for information retrieval research. But as Armstrong et al. [2009b] recently argued, global analysis of experiments suggests that there has actually been little real improvement in ad hoc retrieval effectiveness over time. We investigate this phenomenon in the context of simultaneous testing of many hypotheses using a fixed set of data. We argue that the most common approaches to significance testing ignore a great deal of information about the world. Taking into account even a fairly small amount of this information can lead to very different conclusions about systems than those that have appeared in published literature. We demonstrate how to model a set of IR experiments for analysis both mathematically and practically, and show that doing so can cause p-values from statistical hypothesis tests to increase by orders of magnitude. This has major consequences on the interpretation of experimental results using reusable test collections: it is very difficult to conclude that anything is significant once we have modeled many of the sources of randomness in experimental design and analysis.
引用
收藏
页数:34
相关论文
共 50 条
  • [21] Systems-Based Analysis of the Transcriptome of IgA Nephropathy
    Ofori, Kenneth
    Yakubu, Amin
    Rai, Alex
    LABORATORY INVESTIGATION, 2021, 101 (SUPPL 1) : 1015 - 1017
  • [22] Systems-Based Analysis of Modified tRNA Bases
    Globisch, Daniel
    Pearson, David
    Hienzsch, Antje
    Brueckl, Tobias
    Wagner, Mirko
    Thoma, Ines
    Thumbs, Peter
    Reiter, Veronika
    Kneuttinger, Andrea Christa
    Mueller, Markus
    Sieber, Stephan A.
    Carell, Thomas
    ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, 2011, 50 (41) : 9739 - 9742
  • [23] A systems-based and statistical approach to continuous quality improvement in anatomic pathology
    Ducatman, BS
    Hinkle, TL
    MODERN PATHOLOGY, 2002, 15 (01) : 332A - 332A
  • [24] Neighborhood systems-based rough sets in incomplete information system
    Yang, Xibei
    Zhang, Ming
    Dou, Huili
    Yang, Jingyu
    KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 858 - 867
  • [25] A systems-based and statistical approach to continuous quality improvement in anatomic pathology
    Ducatman, BS
    Hinkle, TL
    LABORATORY INVESTIGATION, 2002, 82 (01) : 332A - 332A
  • [26] Statistical Analysis to Establish the Importance of Information Retrieval Parameters
    Ayter, Julie
    Chifu, Adrian-Gabriel
    Dejean, Sebastien
    Desclaux, Cecile
    Mothe, Josiane
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2015, 21 (13) : 1767 - 1789
  • [27] On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics
    Wang, Jun
    Zhu, Jianhan
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 226 - 233
  • [28] Towards Reliable Testing for Multiple Information Retrieval System Comparisons
    Otero, David
    Parapar, Javier
    Barreiro, Álvaro
    arXiv,
  • [29] Patent Retrieval Based on Multiple Information Resources
    Xu, Kan
    Lin, Hongfei
    Lin, Yuan
    Xu, Bo
    Yang, Liang
    Zhang, Shaowu
    INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2016, 2016, 9994 : 125 - 137
  • [30] A Safety-Aware, Systems-Based Approach to Teaching Software Testing
    Silvis-Cividjian, Natalia
    ITICSE'18: PROCEEDINGS OF THE 23RD ANNUAL ACM CONFERENCE ON INNOVATION AND TECHNOLOGY IN COMPUTER SCIENCE EDUCATION, 2018, : 314 - 319