Assessment of the Quality of Topic Models for Information Retrieval Applications

被引:1
|
作者
Yuan, Meng [1 ]
Lin, Pauline [1 ]
Rashidi, Lida [1 ]
Zobel, Justin [1 ]
机构
[1] Univ Melbourne, Parkville, Vic, Australia
来源
PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023 | 2023年
关键词
topic modelling; topic coherence; collection representation; PHRASE;
D O I
10.1145/3578337.3605118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic modelling is an approach to generation of descriptions of document collections as a set of topics where each has a distinct theme and documents are a blend of topics. It has been applied to retrieval in a range of ways, but there has been little prior work on measurement of whether the topics are descriptive in this context. Moreover, existing methods for assessment of topic quality do not consider how well individual documents are described. To address this issue we propose a new measure of topic quality, which we call specificity; the basis of this measure is the extent to which individual documents are described by a limited number of topics. We also propose a new experimental protocol for validating topic-quality measures, a 'noise dial' that quantifies the extent to which the measure's scores are altered as the topics are degraded by addition of noise. The principle of the mechanism is that a meaningful measure should produce low scores if the 'topics' are essentially random. We show that specificity is at least as effective as existing measures of topic quality and does not require external resources. While other measures relate only to topics, not to documents, we further show that specificity correlates to the extent to which topic models are informative in the retrieval process.
引用
收藏
页码:265 / 274
页数:10
相关论文
共 50 条
  • [1] A Comparative Study of Utilizing Topic Models for Information Retrieval
    Yi, Xing
    Allan, James
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 29 - 41
  • [2] Exploring Influence of Topic Segmentation on Information Retrieval Quality
    Shtekh, Gennady
    Kazakova, Polina
    Nikitinsky, Nikita
    Skachkov, Nikolay
    INTERNET SCIENCE (INSCI 2018), 2018, 11193 : 131 - 140
  • [3] Topic Models Ensembles for AD-HOC Information Retrieval
    Ormeno, Pablo
    Mendoza, Marcelo
    Valle, Carlos
    INFORMATION, 2021, 12 (09)
  • [4] Topic based language models for ad hoc information retrieval
    Azzopardi, L
    Girolami, M
    van Rijsbergen, CJ
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3281 - 3286
  • [5] Topic Structure for Information Retrieval
    He, Jiyin
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 851 - 851
  • [6] Hierarchical Bayesian models for applications in information retrieval
    Blei, DM
    Jordan, M
    Ng, AY
    BAYESIAN STATISTICS 7, 2003, : 25 - 43
  • [7] Quality Assessment of Wikipedia Content Using Topic Models
    Santos, Lauro C. J.
    Christofani, Tais
    Silva, Ismael S.
    Dalip, Daniel H.
    WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 249 - 252
  • [8] Modeling query-document dependencies with topic language models for information retrieval
    Wu, Meng-Sung
    INFORMATION SCIENCES, 2015, 312 : 1 - 12
  • [9] Paradox in Applications of Semantic Similarity Models in Information Retrieval
    Dong, Hai
    Hussain, Farookh Khadeer
    Chang, Elizabeth
    IT REVOLUTIONS, 2009, 11 : 60 - 68
  • [10] A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval
    Baillie, Mark
    Carman, Mark J.
    Crestani, Fabio
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 485 - +