An Outlier-Resilient Autoencoder for Representing High-Dimensional and Incomplete Data

被引:1
|
作者
Wu, Di [1 ]
Hu, Yuanpeng [2 ]
Liu, Kechen [3 ]
Li, Jing [2 ]
Wang, Xianmin [2 ]
Deng, Song [4 ]
Zheng, Nenggan [5 ]
Luo, Xin [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510002, Peoples R China
[3] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[4] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[5] Zhejiang Univ Hangzhou, Qiushi Acad Adv Studies QAAS, Hangzhou 310007, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Data models; Loss measurement; Computational modeling; Predictive models; Standards; Recommender systems; Analytical models; High-dimensional and incomplete data; recommendation model; outlier; cauchy loss; collaborative filtering; LATENT FACTOR-ANALYSIS; RECOMMENDATION; FACTORIZATION;
D O I
10.1109/TETCI.2024.3437370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-dimensional and incomplete (HDI) data commonly arise in various Big Data-related applications, e.g., recommender systems and bioinformatics. Representation is a learning paradigm to map HDI data into low-dimensional latent space for attracting valuable knowledge and patterns. Currently, deep neural network (DNN) is one of the most popular and successful approaches to represent HDI data due to its powerful nonlinear learning ability. However, previous DNNs-based approaches primarily focused on advancing the sophisticated model structure, neglecting the potential adverse effects of outliers. Unfortunately, outliers usually exist in the collected HDI data. For example, HDI data collected from recommender systems inevitably contain many outlier ratings due to some malicious users. To address this issue, this paper proposes a novel outlier-resilient autoencoder (termed ORA). Its core idea is to design an adaptive Cauchy loss strategy to measure the difference between the observed and predicted data for an autoencoder in representing the HDI data. This strategy leverages a more aggressive Cauchy loss to impose a higher penalty on outlier data with large deviation, while utilizing a smoother Cauchy loss to capture the nuanced, deeper features of HDI data. As such, ORA can dynamically adjust the smoothness of the Cauchy loss during training to handle different levels of data deviation. To evaluate the proposed ORA, extensive experiments are conducted on five benchmark HDI datasets. The results validate that: (1) ORA achieves significantly better representation accuracy than State-of-the-Art DNN- and non-DNN-based models, and (2) ORA possesses higher robustness to outlier data than its peers.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Outlier mining based on Variance of Angle technology research in High-Dimensional Data
    Liu, Wenting
    Pan, Ruikai
    2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE), 2015, : 598 - 603
  • [42] Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm
    Tsagris, Michail
    Papadakis, Manos
    Alenazi, Abdulaziz
    Alzeley, Omar
    COMPUTATION, 2024, 12 (09)
  • [43] An Unbiased Distance-Based Outlier Detection Approach for High-Dimensional Data
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    Assent, Ira
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 138 - +
  • [44] High-dimensional data stream outlier detection algorithm based on angle distribution
    Lu, S. (lusheng@cqupt.edu.cn), 1600, Shanghai Jiaotong University (48):
  • [45] Projected outlier detection in high-dimensional mixed-attributes data set
    Ye, Mao
    Li, Xue
    Orlowska, Maria E.
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 7104 - 7113
  • [46] A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
    Zheng, Jian
    Qu, Hongchun
    Li, Zhaoni
    Li, Lin
    Tang, Xiaoming
    Guo, Fei
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [47] Graph Linear Convolution Pooling for Learning in Incomplete High-Dimensional Data
    Bi, Fanghui
    He, Tiantian
    Ong, Yew-Soon
    Luo, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (04) : 1838 - 1852
  • [48] A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
    Zheng J.
    Qu H.
    Li Z.
    Li L.
    Tang X.
    Guo F.
    PeerJ Computer Science, 2022, 8
  • [49] Multiple imputation for high-dimensional mixed incomplete continuous and binary data
    He, Ren
    Belin, Thomas
    STATISTICS IN MEDICINE, 2014, 33 (13) : 2251 - 2262
  • [50] Random subspace ensemble for directly classifying high-dimensional incomplete data
    Tran, Cao Truong
    Nguyen, Binh P.
    EVOLUTIONARY INTELLIGENCE, 2024, 17 (5-6) : 3303 - 3315