Query Variability and Experimental Consistency: A Concerning Case Study

被引:0
|
作者
Rashidi, Lida [1 ,2 ]
Zobel, Justin [1 ]
Moffat, Alistair [1 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] RMIT Univ, Melbourne, Vic, Australia
来源
PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024 | 2024年
基金
澳大利亚研究理事会;
关键词
Evaluation; significance testing;
D O I
10.1145/3664190.3672519
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In offline experimentation, the effectiveness of a search engine is evaluated using a document collection, a set of queries against that collection, a set of relevance judgments connecting the documents and the queries, and an effectiveness metric. This measurement pipeline is used as a surrogate for user satisfaction - the extent to which the system provides useful information to the users that are issuing the queries. But queries are responses to information needs, or topics, and there can be a wide variety of ways in which any given information need can be expressed as a query. That one-to-many relationship suggests that, in an IR experiment, use of any single query to represent a topic may be insufficient. In this case study, we demonstrate that this practice is indeed a weakness, by showing that the TREC 2013 and 2014 Web track queries, which are regarded as being indicative of specific information needs, are not necessarily representative of crowd-generated queries for the same underlying needs, and can give rise to inconsistent system relativities when compared to user-generated queries. From this instance we must thus note an element of concern: that current test collection design strategies can lead to effectiveness results that are at odds with those experienced by typical non-expert users.
引用
收藏
页码:35 / 41
页数:7
相关论文
共 50 条
  • [41] An Experimental Study of Spatial Variability of Rainfall
    Tokay, Ali
    Roche, Rigoberto J.
    Bashor, Paul G.
    JOURNAL OF HYDROMETEOROLOGY, 2014, 15 (02) : 801 - 812
  • [42] In search of query patterns: A case study of a university OPAC
    Lau, Eng Pwey
    Goh, Dion Hoe-Lian
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (05) : 1316 - 1329
  • [43] A temporal query language for OLAP: Implementation and a case study
    Vaisman, AA
    Mendelzon, AO
    DATABASE PROGRAMMING LANGUAGES, 2002, 2397 : 78 - 96
  • [44] Concerning the consistency of fixed points of the temperature scale
    Ancsin, J
    METROLOGIA, 1996, 32 (04) : 295 - 300
  • [45] Some considerations concerning the theory of combined toxicity: A case study of subchronic experimental intoxication with cadmium and lead
    Varaksin, Anatoly N.
    Katsnelson, Boris A.
    Panov, Vladimir G.
    Privalova, Larisa I.
    Kireyeva, Ekaterina P.
    Valamina, Irene E.
    Beresneva, Olga Yu
    FOOD AND CHEMICAL TOXICOLOGY, 2014, 64 : 144 - 156
  • [46] WHAT WE EXPECT OF CHILDREN CONCERNING LEARNING AND MEMORY - A SYSTEMATIC OVERVIEW OF AN EXPERIMENTAL CASE-STUDY
    HASSELHORN, M
    ZEITSCHRIFT FUR ENTWICKLUNGSPSYCHOLOGIE UND PADAGOGISCHE PSYCHOLOGIE, 1987, 19 (02): : 116 - 142
  • [47] Experimental study concerning air-drying of timber
    Marinescu, I
    Campean, M
    Marinescu, N
    6TH INTERNATIONAL IUFRO WOOD DRYING CONFERENCE: WOOD DRYING RESEARCH AND TECHNOLOGY FOR SUSTAINABLE FORESTR Y BEYOND 2000, 1999, : 61 - 70
  • [48] AN EXPERIMENTAL-STUDY CONCERNING THE NITROBENZENE CHLORINATION REACTION
    PETROV, P
    BACALOGLU, I
    BOC, I
    MACARIE, I
    VLAD, F
    REVISTA DE CHIMIE, 1995, 46 (01): : 20 - 24
  • [49] EXPERIMENTAL-STUDY CONCERNING PSYCHOLOGICAL PREPARATION TO PARTURITION
    PERREZ, M
    SCHENKEL, H
    STAUBER, M
    ZEITSCHRIFT FUR GEBURTSHILFE UND PERINATOLOGIE, 1978, 182 (02): : 149 - 155
  • [50] EXPERIMENTAL STUDY ON TETRACOSACTID CONCERNING SENSIBILIZING SIDE EFFECTS
    SCHEIFFARTH, F
    GOTZ, H
    ARZNEIMITTEL-FORSCHUNG, 1970, 20 (03): : 381 - +