Learning to Rank from Noisy Data

被引：6

作者：

Ding, Wenkui ^{[1
]}

Geng, Xiubo ^{[2
]}

Zhang, Xu-Dong ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

[2] Yahoo Labs Beijing, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY | 2015年 / 7卷 / 01期

关键词：

Noisy data; robust learning;

D O I：

10.1145/2576230

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning to rank, which learns the ranking function from training data, has become an emerging research area in information retrieval and machine learning. Most existing work on learning to rank assumes that the training data is clean, which is not always true, however. The ambiguity of query intent, the lack of domain knowledge, and the vague definition of relevance levels all make it difficult for common annotators to give reliable relevance labels to some documents. As a result, the relevance labels in the training data of learning to rank usually contain noise. If we ignore this fact, the performance of learning-to-rank algorithms will be damaged. In this article, we propose considering the labeling noise in the process of learning to rank and using a two-step approach to extend existing algorithms to handle noisy training data. In the first step, we estimate the degree of labeling noise for a training document. To this end, we assume that the majority of the relevance labels in the training data are reliable and we use a graphical model to describe the generative process of a training query, the feature vectors of its associated documents, and the relevance labels of these documents. The parameters in the graphical model are learned by means of maximum likelihood estimation. Then the conditional probability of the relevance label given the feature vector of a document is computed. If the probability is large, we regard the degree of labeling noise for this document as small; otherwise, we regard the degree as large. In the second step, we extend existing learning-to-rank algorithms by incorporating the estimated degree of labeling noise into their loss functions. Specifically, we give larger weights to those training documents with smaller degrees of labeling noise and smaller weights to those with larger degrees of labeling noise. As examples, we demonstrate the extensions for McRank, RankSVM, RankBoost, and RankNet. Empirical results on benchmark datasets show that the proposed approach can effectively distinguish noisy documents from clean ones, and the extended learning-to-rank algorithms can achieve better performances than baselines.

引用

页数：21

共 50 条

[41] Trade-offs in learning controllers from noisy data
Bisoffi, Andrea
De Persis, Claudio
Tesi, Pietro
Systems and Control Letters, 2021, 154
[42] Rank Pruning Approach for Noisy Multi-Label learning
Lian, Siming
Liu, Jianwei
Lu, Runkun
Luo, Xionglin
2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 426 - 431
[43] Learning Robust Data-Based LQG Controllers From Noisy Data
Liu, Wenjie
Wang, Gang
Sun, Jian
Bullo, Francesco
Chen, Jie
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (12) : 8526 - 8538
[44] Learning to Rank with Supplementary Data
Ding, Wenkui
Qin, Tao
Zhang, Xu-Dong
INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 478 - +
[45] Vacillatory and BC learning on noisy data
Case, J
Jain, S
Stephan, F
THEORETICAL COMPUTER SCIENCE, 2000, 241 (1-2) : 115 - 141
[46] Learning to Rank Visual Stories from Human Ranking Data
Hsu, Chi-Yang
Chu, Yun-Wei
Chen, Vincent
Lo, Kuan-Chieh
Chen, Chacha
Huang, Ting-Hao
Ku, Lun-Wei
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6365 - 6378
[47] Learning dynamics from coarse/noisy data with scalable symbolic regression
Chen, Zhao
Wang, Nan
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2023, 190
[48] Learning from noisy label proportions for classifying online social data
Ardehaly E.M.
Culotta A.
Social Network Analysis and Mining, 2018, 8 (1)
[49] On Entropic Learning from Noisy Time Series in the Small Data Regime
Bassetti, Davide
Pospisil, Lukas
Horenko, Illia
ENTROPY, 2024, 26 (07)
[50] Rts: learning robustly from time series data with noisy label
Zhi Zhou
Yi-Xuan Jin
Yu-Feng Li
Frontiers of Computer Science, 2024, 18

← 1 2 3 4 5 →