An Outlier-Resilient Autoencoder for Representing High-Dimensional and Incomplete Data

被引:1
|
作者
Wu, Di [1 ]
Hu, Yuanpeng [2 ]
Liu, Kechen [3 ]
Li, Jing [2 ]
Wang, Xianmin [2 ]
Deng, Song [4 ]
Zheng, Nenggan [5 ]
Luo, Xin [1 ]
机构
[1] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[2] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510002, Peoples R China
[3] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[4] Nanjing Univ Post & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
[5] Zhejiang Univ Hangzhou, Qiushi Acad Adv Studies QAAS, Hangzhou 310007, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Data models; Loss measurement; Computational modeling; Predictive models; Standards; Recommender systems; Analytical models; High-dimensional and incomplete data; recommendation model; outlier; cauchy loss; collaborative filtering; LATENT FACTOR-ANALYSIS; RECOMMENDATION; FACTORIZATION;
D O I
10.1109/TETCI.2024.3437370
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-dimensional and incomplete (HDI) data commonly arise in various Big Data-related applications, e.g., recommender systems and bioinformatics. Representation is a learning paradigm to map HDI data into low-dimensional latent space for attracting valuable knowledge and patterns. Currently, deep neural network (DNN) is one of the most popular and successful approaches to represent HDI data due to its powerful nonlinear learning ability. However, previous DNNs-based approaches primarily focused on advancing the sophisticated model structure, neglecting the potential adverse effects of outliers. Unfortunately, outliers usually exist in the collected HDI data. For example, HDI data collected from recommender systems inevitably contain many outlier ratings due to some malicious users. To address this issue, this paper proposes a novel outlier-resilient autoencoder (termed ORA). Its core idea is to design an adaptive Cauchy loss strategy to measure the difference between the observed and predicted data for an autoencoder in representing the HDI data. This strategy leverages a more aggressive Cauchy loss to impose a higher penalty on outlier data with large deviation, while utilizing a smoother Cauchy loss to capture the nuanced, deeper features of HDI data. As such, ORA can dynamically adjust the smoothness of the Cauchy loss during training to handle different levels of data deviation. To evaluate the proposed ORA, extensive experiments are conducted on five benchmark HDI datasets. The results validate that: (1) ORA achieves significantly better representation accuracy than State-of-the-Art DNN- and non-DNN-based models, and (2) ORA possesses higher robustness to outlier data than its peers.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] CLINCH: Clustering incomplete high-dimensional data for data mining application
    Cheng, ZP
    Zhou, D
    Wang, C
    Guo, JK
    Wang, W
    Ding, BK
    Shi, B
    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 88 - 99
  • [22] OUTLIER DETECTION WITH ENHANCED ANGLE-BASED OUTLIER FACTOR IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Tian, Hao
    Li, Simin
    Zou, Fengbo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2018, 14 (05): : 1633 - 1651
  • [23] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718
  • [24] Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data
    Popovic, Daniel
    Fouche, Edouard
    Boehm, Klemens
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 11695 : 3 - 19
  • [25] Fast outlier detection for high-dimensional data of wireless sensor networks
    Qiao, Yan
    Cui, Xinhong
    Jin, Peng
    Zhang, Wu
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (10)
  • [26] Ordinal Outlier Algorithm for Anomaly Detection of High-Dimensional Data Sets
    Chen, Gang
    Du, Linlin
    An, Baoran
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 5356 - 5361
  • [27] OUTLIER DETECTION BASED ON DENSITY OF HYPERCUBE IN HIGH-DIMENSIONAL DATA STREAM
    Shou, Zhaoyu
    Zou, Fengbo
    Li, Simin
    Lu, Xianying
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (03): : 873 - 889
  • [28] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
  • [29] Multiple imputation and analysis for high-dimensional incomplete proteomics data
    Yin, Xiaoyan
    Levy, Daniel
    Willinger, Christine
    Adourian, Aram
    Larson, Martin G.
    STATISTICS IN MEDICINE, 2016, 35 (08) : 1315 - 1326
  • [30] Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data
    Deng, Wan-Yu
    Liu, Dan
    Dong, Ying-Ying
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018