Query Variability and Experimental Consistency: A Concerning Case Study

被引:0
|
作者
Rashidi, Lida [1 ,2 ]
Zobel, Justin [1 ]
Moffat, Alistair [1 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] RMIT Univ, Melbourne, Vic, Australia
来源
PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024 | 2024年
基金
澳大利亚研究理事会;
关键词
Evaluation; significance testing;
D O I
10.1145/3664190.3672519
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In offline experimentation, the effectiveness of a search engine is evaluated using a document collection, a set of queries against that collection, a set of relevance judgments connecting the documents and the queries, and an effectiveness metric. This measurement pipeline is used as a surrogate for user satisfaction - the extent to which the system provides useful information to the users that are issuing the queries. But queries are responses to information needs, or topics, and there can be a wide variety of ways in which any given information need can be expressed as a query. That one-to-many relationship suggests that, in an IR experiment, use of any single query to represent a topic may be insufficient. In this case study, we demonstrate that this practice is indeed a weakness, by showing that the TREC 2013 and 2014 Web track queries, which are regarded as being indicative of specific information needs, are not necessarily representative of crowd-generated queries for the same underlying needs, and can give rise to inconsistent system relativities when compared to user-generated queries. From this instance we must thus note an element of concern: that current test collection design strategies can lead to effectiveness results that are at odds with those experienced by typical non-expert users.
引用
收藏
页码:35 / 41
页数:7
相关论文
共 50 条
  • [21] Concerning a case of experimental adjustment of the sexual behaviour in mammals
    Bluhm, A
    SITZUNGSBERICHTE DER KONIGLICH PREUSSISCHEN AKADEMIE DER WISSENSCHAFTEN, 1921, : 549 - 556
  • [22] ATTITUDES CONCERNING SIZE OF FAMILY AN EXPERIMENTAL STUDY
    Schmid, Calvin F.
    Engel, Gladys
    SOCIOLOGY AND SOCIAL RESEARCH, 1942, 27 (02): : 126 - 135
  • [23] A study of case concerning the models of vision
    Antunes, A
    Coimbra, D
    OPTICS FOR THE QUALITY OF LIFE, PTS 1 AND 2, 2003, 4829 : 1026 - 1027
  • [25] A QUERY CONCERNING THE PLENUM + WHITEHEAD 'PROCESS AND REALITY'
    SCHMIDT, PF
    PROCESS STUDIES, 1987, 16 (01) : 35 - 37
  • [26] Consistency and variability in functional localisers
    Duncan, Keith J.
    Pattamadilok, Chotiga
    Knierim, Iris
    Devlin, Joseph T.
    NEUROIMAGE, 2009, 46 (04) : 1018 - 1026
  • [28] Variability and consistency in mechatronic design
    Lettner, Daniela
    Hehenberger, Peter
    Noehrer, Alexander
    Anzengruber, Klaus
    Gruenbacher, Paul
    Mayrhofer, Michael
    Egyed, Alexander
    CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS, 2015, 23 (03): : 213 - 225
  • [29] Experimental study on the consistency effect of compare word problem
    Shi, W
    Zhang, ML
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2004, 39 (5-6) : 467 - 467
  • [30] Experimental study on consistency of high leakproofness plasma generator
    Zhang, Yucheng
    Li, Xingwen
    Li, Rui
    Jia, Shenli
    Liu, Qiang
    Gaodianya Jishu/High Voltage Engineering, 2012, 38 (07): : 1642 - 1647