Estimating Measurement Uncertainty for Information Retrieval Effectiveness Metrics

被引:3
|
作者
Moffat, Alistair [1 ]
Scholer, Falk [2 ]
Yang, Ziying [1 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Melbourne, Vic 3010, Australia
[2] RMIT Univ, Sch Comp Sci & Informat Technol, Melbourne, Vic 3001, Australia
来源
基金
澳大利亚研究理事会;
关键词
Evaluation; test collection; effectiveness metric; statistical test; evaluation; information retrieval;
D O I
10.1145/3239572
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One typical way of building test collections for offline measurement of information retrieval systems is to pool the ranked outputs of different systems down to some chosen depth d and then form relevance judgments for those documents only. Non-pooled documents-ones that did not appear in the top-d sets of any of the contributing systems-are then deemed to be non-relevant for the purposes of evaluating the relative behavior of the systems. In this article, we use RBP-derived residuals to re-examine the reliability of that process. By fitting the RBP parameter phi to maximize similarity between AP- and NDCG-induced system rankings, on the one hand, and RBP-induced rankings, on the other, an estimate can be made as to the potential score uncertainty associated with those two recall-based metrics. We then consider the effect that residual size as an indicator of possible measurement uncertainty in utility-based metrics-has in connection with recall-based metrics by computing the effect of increasing pool sizes and examining the trends that arise in terms of both metric score and system separability using standard statistical tests. The experimental results show that the confidence levels expressed via the p-values generated by statistical tests are only weakly connected to the size of the residual and to the degree of measurement uncertainty caused by the presence of unjudged documents. Statistical confidence estimates are, however, largely consistent as pooling depths are altered. We therefore recommend that all such experimental results should report, in addition to the outcomes of statistical significance tests, the residual measurements generated by a suitably matched weighted-precision metric, to give a clear indication of measurement uncertainty that arises due to the presence of unjudged documents in test collections with finite pooled judgments.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Gradient descent optimization of smoothed information retrieval metrics
    Olivier Chapelle
    Mingrui Wu
    Information Retrieval, 2010, 13 : 216 - 235
  • [32] Gradient descent optimization of smoothed information retrieval metrics
    Chapelle, Olivier
    Wu, Mingrui
    INFORMATION RETRIEVAL, 2010, 13 (03): : 216 - 235
  • [33] The Guide to Expression of Uncertainty in Measurement approach for estimating uncertainty:: An appraisal
    Kristiansen, J
    CLINICAL CHEMISTRY, 2003, 49 (11) : 1822 - 1829
  • [34] INFORMATION AND MEASUREMENT SYSTEMS AND UNCERTAINTY
    Volodars'kyy, E. T.
    Dobrolyubova, M. V.
    Kosheva, L. O.
    UKRAINIAN METROLOGICAL JOURNAL, 2020, (3A): : 30 - 34
  • [35] UNCERTAINTY, MEASUREMENT AND THERMODYNAMICS OF INFORMATION
    EVANS, FJ
    LANGHOLZ, G
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1975, 6 (03) : 281 - 288
  • [36] Modeling uncertainty in bibliometrics and information retrieval: an information fusion approach
    Alexander Karlsson
    Björn Hammarfelt
    H. Joe Steinhauer
    Göran Falkman
    Nasrine Olson
    Gustaf Nelhans
    Jan Nolin
    Scientometrics, 2015, 102 : 2255 - 2274
  • [37] Modeling uncertainty in bibliometrics and information retrieval: an information fusion approach
    Karlsson, Alexander
    Hammarfelt, Bjorn
    Steinhauer, H. Joe
    Falkman, Goran
    Olson, Nasrine
    Nelhans, Gustaf
    Nolin, Jan
    SCIENTOMETRICS, 2015, 102 (03) : 2255 - 2274
  • [38] Estimating measurement uncertainty in an afternoon. A case study in the practical application of measurement uncertainty
    Armishaw, P
    ACCREDITATION AND QUALITY ASSURANCE, 2003, 8 (05) : 218 - 224
  • [39] Estimating measurement uncertainty in an afternoon. A case study in the practical application of measurement uncertainty
    Paul Armishaw
    Accreditation and Quality Assurance, 2003, 8 : 218 - 224
  • [40] Retrieval Effectiveness of Cross Language Information Retrieval Search Engines
    Foo, Schubert
    DIGITAL LIBRARIES: FOR CULTURAL HERITAGE, KNOWLEDGE DISSEMINATION, AND FUTURE CREATION: ICADL 2011, 2011, 7008 : 296 - 306