Detecting Errors with Zero-Shot Learning

被引:2
|
作者
Wu, Xiaoyu [1 ,2 ]
Wang, Ning [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
基金
国家重点研发计划;
关键词
error detection; zero-shot learning; self-attention mechanism; KNOWLEDGE-BASE;
D O I
10.3390/e24070936
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Error detection is a critical step in data cleaning. Most traditional error detection methods are based on rules and external information with high cost, especially when dealing with large-scaled data. Recently, with the advances of deep learning, some researchers focus their attention on learning the semantic distribution of data for error detection; however, the low error rate in real datasets makes it hard to collect negative samples for training supervised deep learning models. Most of the existing deep-learning-based error detection algorithms solve the class imbalance problem by data augmentation. Due to the inadequate sampling of negative samples, the features learned by those methods may be biased. In this paper, we propose an AEGAN (Auto-Encoder Generative Adversarial Network)-based deep learning model named SAT-GAN (Self-Attention Generative Adversarial Network) to detect errors in relational datasets. Combining the self-attention mechanism with the pre-trained language model, our model can capture semantic features of the dataset, specifically the functional dependency between attributes, so that no rules or constraints are needed for SAT-GAN to identify inconsistent data. For the lack of negative samples, we propose to train our model via zero-shot learning. As a clean-data tailored model, SAT-GAN tries to recognize error data as outliers by learning the latent features of clean data. In our evaluation, SAT-GAN achieves an average F-1-score of 0.95 on five datasets, which yields at least 46.2% F-1-score improvement over rule-based methods and outperforms state-of-the-art deep learning approaches in the absence of rules and negative samples.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Zero-Shot Learning With Transferred Samples
    Guo, Yuchen
    Ding, Guiguang
    Han, Jungong
    Gao, Yue
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (07) : 3277 - 3290
  • [22] LVQ Treatment for Zero-Shot Learning
    Ismailoglu, Firat
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2023, 31 (01) : 216 - 237
  • [23] Attribute subspaces for zero-shot learning
    Zhou, Lei
    Liu, Yang
    Bai, Xiao
    Li, Na
    Yu, Xiaohan
    Zhou, Jun
    Hancock, Edwin R.
    PATTERN RECOGNITION, 2023, 144
  • [24] A review on multimodal zero-shot learning
    Cao, Weipeng
    Wu, Yuhao
    Sun, Yixuan
    Zhang, Haigang
    Ren, Jin
    Gu, Dujuan
    Wang, Xingkai
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (02)
  • [25] Zero-Shot Learning with Attribute Selection
    Guo, Yuchen
    Ding, Guiguang
    Han, Jungong
    Tang, Sheng
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 6870 - 6877
  • [26] Research and Development on Zero-Shot Learning
    Zhang L.-N.
    Zuo X.
    Liu J.-W.
    Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (01): : 1 - 23
  • [27] Synthesizing Samples for Zero-shot Learning
    Guo, Yuchen
    Ding, Guiguang
    Han, Jungong
    Gao, Yue
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1774 - 1780
  • [28] Towards Open Zero-Shot Learning
    Marmoreo, Federico
    Carrazco, Julio Ivan Davila
    Cavazza, Jacopo
    Murino, Vittorio
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 564 - 575
  • [29] Zero-Shot Compositional Concept Learning
    Xu, Guangyue
    Kordjamshidi, Parisa
    Chai, Joyce Y.
    1ST WORKSHOP ON META LEARNING AND ITS APPLICATIONS TO NATURAL LANGUAGE PROCESSING (METANLP 2021), 2021, : 19 - 27
  • [30] Semantic Autoencoder for Zero-Shot Learning
    Kodirov, Elyor
    Xiang, Tao
    Gong, Shaogang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4447 - 4456