An Outlier-Resilient Autoencoder for Representing High-Dimensional and Incomplete Data

被引:1
|
作者
Wu, Di [1 ]
Hu, Yuanpeng [2 ]
Liu, Kechen [3 ]
Li, Jing [2 ]
Wang, Xianmin [2 ]
Deng, Song [4 ]
Zheng, Nenggan [5 ]
Luo, Xin [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510002, Peoples R China
[3] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[4] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[5] Zhejiang Univ Hangzhou, Qiushi Acad Adv Studies QAAS, Hangzhou 310007, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Data models; Loss measurement; Computational modeling; Predictive models; Standards; Recommender systems; Analytical models; High-dimensional and incomplete data; recommendation model; outlier; cauchy loss; collaborative filtering; LATENT FACTOR-ANALYSIS; RECOMMENDATION; FACTORIZATION;
D O I
10.1109/TETCI.2024.3437370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-dimensional and incomplete (HDI) data commonly arise in various Big Data-related applications, e.g., recommender systems and bioinformatics. Representation is a learning paradigm to map HDI data into low-dimensional latent space for attracting valuable knowledge and patterns. Currently, deep neural network (DNN) is one of the most popular and successful approaches to represent HDI data due to its powerful nonlinear learning ability. However, previous DNNs-based approaches primarily focused on advancing the sophisticated model structure, neglecting the potential adverse effects of outliers. Unfortunately, outliers usually exist in the collected HDI data. For example, HDI data collected from recommender systems inevitably contain many outlier ratings due to some malicious users. To address this issue, this paper proposes a novel outlier-resilient autoencoder (termed ORA). Its core idea is to design an adaptive Cauchy loss strategy to measure the difference between the observed and predicted data for an autoencoder in representing the HDI data. This strategy leverages a more aggressive Cauchy loss to impose a higher penalty on outlier data with large deviation, while utilizing a smoother Cauchy loss to capture the nuanced, deeper features of HDI data. As such, ORA can dynamically adjust the smoothness of the Cauchy loss during training to handle different levels of data deviation. To evaluate the proposed ORA, extensive experiments are conducted on five benchmark HDI datasets. The results validate that: (1) ORA achieves significantly better representation accuracy than State-of-the-Art DNN- and non-DNN-based models, and (2) ORA possesses higher robustness to outlier data than its peers.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Variational autoencoder-based outlier detection for high-dimensional data
    Li, Yongmou
    Wang, Yijie
    Ma, Xingkong
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 991 - 1002
  • [2] MMA: Multi-Metric-Autoencoder for Analyzing High-Dimensional and Incomplete Data
    Liang, Cheng
    Wu, Di
    He, Yi
    Huang, Teng
    Chen, Zhong
    Luo, Xin
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT V, 2023, 14173 : 3 - 19
  • [3] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [4] Intrinsic dimensional outlier detection in high-dimensional data
    Von Brünken, Jonathan
    Houle, Michael E.
    Zimek, Arthur
    NII Technical Reports, 2015, (03): : 1 - 12
  • [5] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
  • [6] A geometric framework for outlier detection in high-dimensional data
    Herrmann, Moritz
    Pfisterer, Florian
    Scheipl, Fabian
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)
  • [7] A Comparison of Outlier Detection Techniques for High-Dimensional Data
    Xu, Xiaodan
    Liu, Huawen
    Li, Li
    Yao, Minghai
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 652 - 662
  • [8] Adaptive Clustering for Outlier Identification in High-Dimensional Data
    Thudumu, Srikanth
    Branch, Philip
    Jin, Jiong
    Singh, Jugdutt
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 215 - 228
  • [9] Variable selection for high-dimensional incomplete data
    Liang, Lixing
    Zhuang, Yipeng
    Yu, Philip L. H.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 192
  • [10] Outlier mining in large high-dimensional data sets
    Angiulli, F
    Pizzuti, C
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (02) : 203 - 215