Assessment of the Quality of Topic Models for Information Retrieval Applications

被引：1

作者：

Yuan, Meng ^{[1
]}

Lin, Pauline ^{[1
]}

Rashidi, Lida ^{[1
]}

Zobel, Justin ^{[1
]}

机构：

[1] Univ Melbourne, Parkville, Vic, Australia

来源：

PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023 | 2023年

关键词：

topic modelling; topic coherence; collection representation; PHRASE;

D O I：

10.1145/3578337.3605118

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Topic modelling is an approach to generation of descriptions of document collections as a set of topics where each has a distinct theme and documents are a blend of topics. It has been applied to retrieval in a range of ways, but there has been little prior work on measurement of whether the topics are descriptive in this context. Moreover, existing methods for assessment of topic quality do not consider how well individual documents are described. To address this issue we propose a new measure of topic quality, which we call specificity; the basis of this measure is the extent to which individual documents are described by a limited number of topics. We also propose a new experimental protocol for validating topic-quality measures, a 'noise dial' that quantifies the extent to which the measure's scores are altered as the topics are degraded by addition of noise. The principle of the mechanism is that a meaningful measure should produce low scores if the 'topics' are essentially random. We show that specificity is at least as effective as existing measures of topic quality and does not require external resources. While other measures relate only to topics, not to documents, we further show that specificity correlates to the extent to which topic models are informative in the retrieval process.

引用

页码：265 / 274

页数：10

共 50 条

[1] A Comparative Study of Utilizing Topic Models for Information Retrieval
Yi, Xing
Allan, James
ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 29 - 41
[2] Exploring Influence of Topic Segmentation on Information Retrieval Quality
Shtekh, Gennady
Kazakova, Polina
Nikitinsky, Nikita
Skachkov, Nikolay
INTERNET SCIENCE (INSCI 2018), 2018, 11193 : 131 - 140
[3] Topic Models Ensembles for AD-HOC Information Retrieval
Ormeno, Pablo
Mendoza, Marcelo
Valle, Carlos
INFORMATION, 2021, 12 (09)
[4] Topic based language models for ad hoc information retrieval
Azzopardi, L
Girolami, M
van Rijsbergen, CJ
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3281 - 3286
[5] Topic Structure for Information Retrieval
He, Jiyin
PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 851 - 851
[6] Hierarchical Bayesian models for applications in information retrieval
Blei, DM
Jordan, M
Ng, AY
BAYESIAN STATISTICS 7, 2003, : 25 - 43
[7] Quality Assessment of Wikipedia Content Using Topic Models
Santos, Lauro C. J.
Christofani, Tais
Silva, Ismael S.
Dalip, Daniel H.
WEBMEDIA 2019: PROCEEDINGS OF THE 25TH BRAZILLIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB, 2019, : 249 - 252
[8] Modeling query-document dependencies with topic language models for information retrieval
Wu, Meng-Sung
INFORMATION SCIENCES, 2015, 312 : 1 - 12
[9] Paradox in Applications of Semantic Similarity Models in Information Retrieval
Dong, Hai
Hussain, Farookh Khadeer
Chang, Elizabeth
IT REVOLUTIONS, 2009, 11 : 60 - 68
[10] A Topic-Based Measure of Resource Description Quality for Distributed Information Retrieval
Baillie, Mark
Carman, Mark J.
Crestani, Fabio
ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 485 - +

← 1 2 3 4 5 →