Multiple Testing in Statistical Analysis of Systems-Based Information Retrieval Experiments

被引:61
|
作者
Carterette, Benjamin A. [1 ]
机构
[1] Univ Delaware, Dept Comp & Informat Syst, Newark, DE 19716 USA
关键词
Experimentation; Measurement; Theory; Information retrieval; effectiveness evaluation; test collections; experimental design; statistical analysis; INFERENCE;
D O I
10.1145/2094072.2094076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-quality reusable test collections and formal statistical hypothesis testing together support a rigorous experimental environment for information retrieval research. But as Armstrong et al. [2009b] recently argued, global analysis of experiments suggests that there has actually been little real improvement in ad hoc retrieval effectiveness over time. We investigate this phenomenon in the context of simultaneous testing of many hypotheses using a fixed set of data. We argue that the most common approaches to significance testing ignore a great deal of information about the world. Taking into account even a fairly small amount of this information can lead to very different conclusions about systems than those that have appeared in published literature. We demonstrate how to model a set of IR experiments for analysis both mathematically and practically, and show that doing so can cause p-values from statistical hypothesis tests to increase by orders of magnitude. This has major consequences on the interpretation of experimental results using reusable test collections: it is very difficult to conclude that anything is significant once we have modeled many of the sources of randomness in experimental design and analysis.
引用
收藏
页数:34
相关论文
共 50 条
  • [41] Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors
    Urbano, Julian
    Lima, Harlley
    Hanjalic, Alan
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 505 - 514
  • [42] PROBABILITY STATISTICAL-MODEL OF INFORMATION-RETRIEVAL IN DESCRIPTOR SYSTEMS
    DRIYANSKII, VM
    KOMAROVA, TN
    CYBERNETICS, 1981, 17 (06): : 835 - 840
  • [43] STATISTICAL-ANALYSIS OF EXPERIMENTS CONDUCTED AT MULTIPLE SITES
    GREENWOOD, JJD
    OIKOS, 1994, 69 (02) : 334 - 334
  • [44] STATISTICAL-ANALYSIS OF EXPERIMENTS CONDUCTED AT MULTIPLE SITES
    MCKONE, MJ
    LIVELY, CM
    OIKOS, 1993, 67 (01) : 184 - 186
  • [45] Experiments on information retrieval using case-based reasoning
    Ramirez, C
    MICAI 2000: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2000, 1793 : 25 - 39
  • [46] Contextual Information Retrieval based on Algorithmic Information Theory and statistical outlier detection
    Martinez, Rafael
    Cebrian, Manuel
    Rodriguez, Francisco de Borja
    Camacho, David
    2008 IEEE INFORMATION THEORY WORKSHOP, 2008, : 292 - 297
  • [47] Internet-based information and retrieval systems
    O'Leary, DE
    DECISION SUPPORT SYSTEMS, 1999, 27 (03) : 319 - 327
  • [48] MINICOMPUTER BASED INFORMATION-RETRIEVAL SYSTEMS
    VASTOLA, FJ
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1975, (169): : 15 - 15
  • [49] Information Retrieval: A New Multilingual Stemmer Based on a Statistical Approach
    Gadri, Said
    Moussaoui, Abdelouahab
    3RD INTERNATIONAL CONFERENCE ON CONTROL, ENGINEERING & INFORMATION TECHNOLOGY (CEIT 2015), 2015,
  • [50] Information granulation for Web based information retrieval support systems
    Yao, JT
    Yao, YY
    DATA MINING AND KNOWLEDGE DISCOVERY: TOOLS AND TECHNOLOGY V, 2003, 5098 : 138 - 146