An Outlier-Resilient Autoencoder for Representing High-Dimensional and Incomplete Data

被引：1

作者：

Wu, Di ^{[1
]}

Hu, Yuanpeng ^{[2
]}

Liu, Kechen ^{[3
]}

Li, Jing ^{[2
]}

Wang, Xianmin ^{[2
]}

Deng, Song ^{[4
]}

Zheng, Nenggan ^{[5
]}

Luo, Xin ^{[1
]}

机构：

[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China

[2] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510002, Peoples R China

[3] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA

[4] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China

[5] Zhejiang Univ Hangzhou, Qiushi Acad Adv Studies QAAS, Hangzhou 310007, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年

基金：

中国国家自然科学基金;

关键词：

Data models; Loss measurement; Computational modeling; Predictive models; Standards; Recommender systems; Analytical models; High-dimensional and incomplete data; recommendation model; outlier; cauchy loss; collaborative filtering; LATENT FACTOR-ANALYSIS; RECOMMENDATION; FACTORIZATION;

D O I：

10.1109/TETCI.2024.3437370

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

High-dimensional and incomplete (HDI) data commonly arise in various Big Data-related applications, e.g., recommender systems and bioinformatics. Representation is a learning paradigm to map HDI data into low-dimensional latent space for attracting valuable knowledge and patterns. Currently, deep neural network (DNN) is one of the most popular and successful approaches to represent HDI data due to its powerful nonlinear learning ability. However, previous DNNs-based approaches primarily focused on advancing the sophisticated model structure, neglecting the potential adverse effects of outliers. Unfortunately, outliers usually exist in the collected HDI data. For example, HDI data collected from recommender systems inevitably contain many outlier ratings due to some malicious users. To address this issue, this paper proposes a novel outlier-resilient autoencoder (termed ORA). Its core idea is to design an adaptive Cauchy loss strategy to measure the difference between the observed and predicted data for an autoencoder in representing the HDI data. This strategy leverages a more aggressive Cauchy loss to impose a higher penalty on outlier data with large deviation, while utilizing a smoother Cauchy loss to capture the nuanced, deeper features of HDI data. As such, ORA can dynamically adjust the smoothness of the Cauchy loss during training to handle different levels of data deviation. To evaluate the proposed ORA, extensive experiments are conducted on five benchmark HDI datasets. The results validate that: (1) ORA achieves significantly better representation accuracy than State-of-the-Art DNN- and non-DNN-based models, and (2) ORA possesses higher robustness to outlier data than its peers.

引用

页数：13

共 50 条

[41] Outlier mining based on Variance of Angle technology research in High-Dimensional Data
Liu, Wenting
Pan, Ruikai
2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE), 2015, : 598 - 603
[42] Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm
Tsagris, Michail
Papadakis, Manos
Alenazi, Abdulaziz
Alzeley, Omar
COMPUTATION, 2024, 12 (09)
[43] An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data
Hoang Vu Nguyen
Gopalkrishnan, Vivekanand
Assent, Ira
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 138 - +
[44] High-dimensional data stream outlier detection algorithm based on angle distribution
Lu, S. (lusheng@cqupt.edu.cn), 1600, Shanghai Jiaotong University (48):
[45] Projected outlier detection in high-dimensional mixed-attributes data set
Ye, Mao
Li, Xue
Orlowska, Maria E.
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 7104 - 7113
[46] A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
Zheng, Jian
Qu, Hongchun
Li, Zhaoni
Li, Lin
Tang, Xiaoming
Guo, Fei
PEERJ COMPUTER SCIENCE, 2022, 8
[47] Graph Linear Convolution Pooling for Learning in Incomplete High-Dimensional Data
Bi, Fanghui
He, Tiantian
Ong, Yew-Soon
Luo, Xin
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (04) : 1838 - 1852
[48] A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
Zheng J.
Qu H.
Li Z.
Li L.
Tang X.
Guo F.
PeerJ Computer Science, 2022, 8
[49] Multiple imputation for high-dimensional mixed incomplete continuous and binary data
He, Ren
Belin, Thomas
STATISTICS IN MEDICINE, 2014, 33 (13) : 2251 - 2262
[50] Random subspace ensemble for directly classifying high-dimensional incomplete data
Tran, Cao Truong
Nguyen, Binh P.
EVOLUTIONARY INTELLIGENCE, 2024, 17 (5-6) : 3303 - 3315

← 1 2 3 4 5 →