An Outlier-Resilient Autoencoder for Representing High-Dimensional and Incomplete Data

被引:1
|
作者
Wu, Di [1 ]
Hu, Yuanpeng [2 ]
Liu, Kechen [3 ]
Li, Jing [2 ]
Wang, Xianmin [2 ]
Deng, Song [4 ]
Zheng, Nenggan [5 ]
Luo, Xin [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510002, Peoples R China
[3] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[4] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[5] Zhejiang Univ Hangzhou, Qiushi Acad Adv Studies QAAS, Hangzhou 310007, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Data models; Loss measurement; Computational modeling; Predictive models; Standards; Recommender systems; Analytical models; High-dimensional and incomplete data; recommendation model; outlier; cauchy loss; collaborative filtering; LATENT FACTOR-ANALYSIS; RECOMMENDATION; FACTORIZATION;
D O I
10.1109/TETCI.2024.3437370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-dimensional and incomplete (HDI) data commonly arise in various Big Data-related applications, e.g., recommender systems and bioinformatics. Representation is a learning paradigm to map HDI data into low-dimensional latent space for attracting valuable knowledge and patterns. Currently, deep neural network (DNN) is one of the most popular and successful approaches to represent HDI data due to its powerful nonlinear learning ability. However, previous DNNs-based approaches primarily focused on advancing the sophisticated model structure, neglecting the potential adverse effects of outliers. Unfortunately, outliers usually exist in the collected HDI data. For example, HDI data collected from recommender systems inevitably contain many outlier ratings due to some malicious users. To address this issue, this paper proposes a novel outlier-resilient autoencoder (termed ORA). Its core idea is to design an adaptive Cauchy loss strategy to measure the difference between the observed and predicted data for an autoencoder in representing the HDI data. This strategy leverages a more aggressive Cauchy loss to impose a higher penalty on outlier data with large deviation, while utilizing a smoother Cauchy loss to capture the nuanced, deeper features of HDI data. As such, ORA can dynamically adjust the smoothness of the Cauchy loss during training to handle different levels of data deviation. To evaluate the proposed ORA, extensive experiments are conducted on five benchmark HDI datasets. The results validate that: (1) ORA achieves significantly better representation accuracy than State-of-the-Art DNN- and non-DNN-based models, and (2) ORA possesses higher robustness to outlier data than its peers.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Clustering Lines in High-Dimensional Space: Classification of Incomplete Data
    Gao, Jie
    Langberg, Michael
    Schulman, Leonard J.
    ACM TRANSACTIONS ON ALGORITHMS, 2010, 7 (01)
  • [32] Representing the dynamics of high-dimensional data with non-redundant wavelets
    Jia, Shanshan
    Li, Xingyi
    Huang, Tiejun
    Liu, Jian K.
    Yu, Zhaofei
    PATTERNS, 2022, 3 (03):
  • [33] Subspace rotations for high-dimensional outlier detection
    Chung, Hee Cheol
    Ahn, Jeongyoun
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 183
  • [34] Outlier rejection in high-dimensional deformable models
    Vogler, Christian
    Goldenstein, Siome
    Stolfi, Jorge
    Pavlovic, Vladimir
    Metaxas, Dimitris
    IMAGE AND VISION COMPUTING, 2007, 25 (03) : 274 - 284
  • [35] Local projections for high-dimensional outlier detection
    Thomas Ortner
    Peter Filzmoser
    Maia Rohm
    Sarka Brodinova
    Christian Breiteneder
    METRON, 2021, 79 : 189 - 206
  • [36] Local projections for high-dimensional outlier detection
    Ortner, Thomas
    Filzmoser, Peter
    Rohm, Maia
    Brodinova, Sarka
    Breiteneder, Christian
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2021, 79 (02): : 189 - 206
  • [37] Outlier detection in high-dimensional regression model
    Wang, Tao
    Li, Zhonghua
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (14) : 6947 - 6958
  • [38] Optimal outlier removal in high-dimensional spaces
    Dunagan, J
    Vempala, S
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2004, 68 (02) : 335 - 373
  • [39] Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping
    Li, Junli
    Zhang, Jifu
    Pang, Ning
    Qin, Xiao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4295 - 4308
  • [40] IPMOD: An efficient outlier detection model for high-dimensional medical data streams
    Yang, Yun
    Fan, ChongJun
    Chen, Liang
    Xiong, HongLin
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191