Multiple Testing in Statistical Analysis of Systems-Based Information Retrieval Experiments

被引:61
|
作者
Carterette, Benjamin A. [1 ]
机构
[1] Univ Delaware, Dept Comp & Informat Syst, Newark, DE 19716 USA
关键词
Experimentation; Measurement; Theory; Information retrieval; effectiveness evaluation; test collections; experimental design; statistical analysis; INFERENCE;
D O I
10.1145/2094072.2094076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-quality reusable test collections and formal statistical hypothesis testing together support a rigorous experimental environment for information retrieval research. But as Armstrong et al. [2009b] recently argued, global analysis of experiments suggests that there has actually been little real improvement in ad hoc retrieval effectiveness over time. We investigate this phenomenon in the context of simultaneous testing of many hypotheses using a fixed set of data. We argue that the most common approaches to significance testing ignore a great deal of information about the world. Taking into account even a fairly small amount of this information can lead to very different conclusions about systems than those that have appeared in published literature. We demonstrate how to model a set of IR experiments for analysis both mathematically and practically, and show that doing so can cause p-values from statistical hypothesis tests to increase by orders of magnitude. This has major consequences on the interpretation of experimental results using reusable test collections: it is very difficult to conclude that anything is significant once we have modeled many of the sources of randomness in experimental design and analysis.
引用
收藏
页数:34
相关论文
共 50 条
  • [31] CULTURAL TRANSMISSION IN 3 SOCIETIES - TESTING A SYSTEMS-BASED FIELD GUIDE
    DOBBERT, ML
    PITMAN, MA
    EISIKOVITS, RA
    GAMRADT, JK
    CHUN, KS
    ANTHROPOLOGY & EDUCATION QUARTERLY, 1984, 15 (04) : 275 - 311
  • [32] Cross-language Information Retrieval Based on Multiple Information
    Liu, Pengyuan
    Zheng, Zhijun
    Su, Qi
    2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 623 - 626
  • [33] SELECTED RESULTS FROM AN INQUIRY INTO TESTING OF INFORMATION RETRIEVAL SYSTEMS
    SARACEVIC, T
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1971, 22 (02): : 126 - 139
  • [34] INFORMATION RETRIEVAL SYSTEMS - CHARACTERISTICS, TESTING, AND EVALUATION - LANCASTER,FW
    不详
    AMERICAN DOCUMENTATION, 1969, 20 (02): : 173 - 173
  • [35] Evolving Neuro-Fuzzy Systems-Based Design of Experiments in Process Identification
    Ozbot, Miha
    Lughofer, Edwin
    Skrjanc, Igor
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (06) : 1995 - 2005
  • [36] Biodefense Policy Analysis-A Systems-based Approach
    DiEuliis, Diane
    Rao, Venkat
    Billings, Emily A.
    Meyer, Corey B.
    Berger, Kavita
    HEALTH SECURITY, 2019, 17 (02) : 83 - 99
  • [37] Geographic information systems-based pavement management system - A case study
    Medina, A
    Flintsch, GW
    Zaniewski, JP
    SEVENTH INTERNATIONAL CONFERENCE ON LOW-VOLUME ROADS 1999, VOL 2: PLANNING, ADMINISTRATION, AND ENVIRONMENT; DESIGN; MATERIALS, CONSTRUCTION, AND MAINTENANCE; OPERATIONS AND SAFETY, 1999, (1652): : 151 - 157
  • [38] Generalized neighborhood systems-based pessimistic rough sets and their applications in incomplete information systems
    Pang, Jing
    Yao, Bingxue
    Li, Lingqiang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (03) : 2713 - 2725
  • [39] Experiments in discourse analysis impact on information classification and retrieval algorithms
    Morato, J
    Llorens, J
    Genova, G
    Moreiro, JA
    INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (06) : 825 - 851
  • [40] On the Universality and Contributions of Multiple Criteria Decision Making: A Systems-Based Approach
    Haimes, Yacov
    JOURNAL OF MULTI-CRITERIA DECISION ANALYSIS, 2011, 18 (1-2) : 91 - 99