An Outlier-Resilient Autoencoder for Representing High-Dimensional and Incomplete Data

被引：1

作者：

Wu, Di ^{[1
]}

Hu, Yuanpeng ^{[2
]}

Liu, Kechen ^{[3
]}

Li, Jing ^{[2
]}

Wang, Xianmin ^{[2
]}

Deng, Song ^{[4
]}

Zheng, Nenggan ^{[5
]}

Luo, Xin ^{[1
]}

机构：

[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China

[2] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510002, Peoples R China

[3] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA

[4] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China

[5] Zhejiang Univ Hangzhou, Qiushi Acad Adv Studies QAAS, Hangzhou 310007, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年

基金：

中国国家自然科学基金;

关键词：

Data models; Loss measurement; Computational modeling; Predictive models; Standards; Recommender systems; Analytical models; High-dimensional and incomplete data; recommendation model; outlier; cauchy loss; collaborative filtering; LATENT FACTOR-ANALYSIS; RECOMMENDATION; FACTORIZATION;

D O I：

10.1109/TETCI.2024.3437370

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

High-dimensional and incomplete (HDI) data commonly arise in various Big Data-related applications, e.g., recommender systems and bioinformatics. Representation is a learning paradigm to map HDI data into low-dimensional latent space for attracting valuable knowledge and patterns. Currently, deep neural network (DNN) is one of the most popular and successful approaches to represent HDI data due to its powerful nonlinear learning ability. However, previous DNNs-based approaches primarily focused on advancing the sophisticated model structure, neglecting the potential adverse effects of outliers. Unfortunately, outliers usually exist in the collected HDI data. For example, HDI data collected from recommender systems inevitably contain many outlier ratings due to some malicious users. To address this issue, this paper proposes a novel outlier-resilient autoencoder (termed ORA). Its core idea is to design an adaptive Cauchy loss strategy to measure the difference between the observed and predicted data for an autoencoder in representing the HDI data. This strategy leverages a more aggressive Cauchy loss to impose a higher penalty on outlier data with large deviation, while utilizing a smoother Cauchy loss to capture the nuanced, deeper features of HDI data. As such, ORA can dynamically adjust the smoothness of the Cauchy loss during training to handle different levels of data deviation. To evaluate the proposed ORA, extensive experiments are conducted on five benchmark HDI datasets. The results validate that: (1) ORA achieves significantly better representation accuracy than State-of-the-Art DNN- and non-DNN-based models, and (2) ORA possesses higher robustness to outlier data than its peers.

引用

页数：13

共 50 条

[1] Variational autoencoder-based outlier detection for high-dimensional data
Li, Yongmou
Wang, Yijie
Ma, Xingkong
INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 991 - 1002
[2] MMA: Multi-Metric-Autoencoder for Analyzing High-Dimensional and Incomplete Data
Liang, Cheng
Wu, Di
He, Yi
Huang, Teng
Chen, Zhong
Luo, Xin
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT V, 2023, 14173 : 3 - 19
[3] Outlier detection for high-dimensional data
Ro, Kwangil
Zou, Changliang
Wang, Zhaojun
Yin, Guosheng
BIOMETRIKA, 2015, 102 (03) : 589 - 599
[4] Intrinsic dimensional outlier detection in high-dimensional data
Von Brünken, Jonathan
Houle, Michael E.
Zimek, Arthur
NII Technical Reports, 2015, (03): : 1 - 12
[5] Efficient Outlier Detection for High-Dimensional Data
Liu, Huawen
Li, Xuelong
Li, Jiuyong
Zhang, Shichao
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
[6] A geometric framework for outlier detection in high-dimensional data
Herrmann, Moritz
Pfisterer, Florian
Scheipl, Fabian
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)
[7] A Comparison of Outlier Detection Techniques for High-Dimensional Data
Xu, Xiaodan
Liu, Huawen
Li, Li
Yao, Minghai
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662
[8] Adaptive Clustering for Outlier Identification in High-Dimensional Data
Thudumu, Srikanth
Branch, Philip
Jin, Jiong
Singh, Jugdutt
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 215 - 228
[9] Variable selection for high-dimensional incomplete data
Liang, Lixing
Zhuang, Yipeng
Yu, Philip L. H.
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
[10] Outlier mining in large high-dimensional data sets
Angiulli, F
Pizzuti, C
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215

← 1 2 3 4 5 →