Detecting Errors with Zero-Shot Learning

被引:2
|
作者
Wu, Xiaoyu [1 ,2 ]
Wang, Ning [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
基金
国家重点研发计划;
关键词
error detection; zero-shot learning; self-attention mechanism; KNOWLEDGE-BASE;
D O I
10.3390/e24070936
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Error detection is a critical step in data cleaning. Most traditional error detection methods are based on rules and external information with high cost, especially when dealing with large-scaled data. Recently, with the advances of deep learning, some researchers focus their attention on learning the semantic distribution of data for error detection; however, the low error rate in real datasets makes it hard to collect negative samples for training supervised deep learning models. Most of the existing deep-learning-based error detection algorithms solve the class imbalance problem by data augmentation. Due to the inadequate sampling of negative samples, the features learned by those methods may be biased. In this paper, we propose an AEGAN (Auto-Encoder Generative Adversarial Network)-based deep learning model named SAT-GAN (Self-Attention Generative Adversarial Network) to detect errors in relational datasets. Combining the self-attention mechanism with the pre-trained language model, our model can capture semantic features of the dataset, specifically the functional dependency between attributes, so that no rules or constraints are needed for SAT-GAN to identify inconsistent data. For the lack of negative samples, we propose to train our model via zero-shot learning. As a clean-data tailored model, SAT-GAN tries to recognize error data as outliers by learning the latent features of clean data. In our evaluation, SAT-GAN achieves an average F-1-score of 0.95 on five datasets, which yields at least 46.2% F-1-score improvement over rule-based methods and outperforms state-of-the-art deep learning approaches in the absence of rules and negative samples.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Zero-shot Learning via Simultaneous Generating and Learning
    Yu, Hyeonwoo
    Lee, Beomhee
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [42] Learning Attention as Disentangler for Compositional Zero-shot Learning
    Hao, Shaozhe
    Han, Kai
    Wong, Kwan-Yee K.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15315 - 15324
  • [43] Attributes learning network for generalized zero-shot learning
    Yun, Yu
    Wang, Sen
    Hou, Mingzhen
    Gao, Quanxue
    NEURAL NETWORKS, 2022, 150 : 112 - 118
  • [44] Learning Graph Embeddings for Compositional Zero-shot Learning
    Naeem, Muhammad Ferjad
    Xian, Yongqin
    Tombari, Federico
    Akata, Zeynep
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 953 - 962
  • [45] Learning Conditional Attributes for Compositional Zero-Shot Learning
    Wang, Qingsheng
    Liu, Lingqiao
    Jing, Chenchen
    Chen, Hao
    Liang, Guoqiang
    Wang, Peng
    Shen, Chunhua
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11197 - 11206
  • [46] Learning a Deep Embedding Model for Zero-Shot Learning
    Zhang, Li
    Xiang, Tao
    Gong, Shaogang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3010 - 3019
  • [47] Integrative zero-shot learning for fruit recognition
    Tran-Anh, Dat
    Huu, Quynh Nguyen
    Bui-Quoc, Bao
    Hoang, Ngan Dao
    Quoc, Tao Ngo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 73191 - 73213
  • [48] Structure Fusion and Propagation for Zero-Shot Learning
    Lin, Guangfeng
    Chen, Yajun
    Zhao, Fan
    PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 465 - 477
  • [49] Kernelized distance learning for zero-shot recognition
    Zarei, Mohammad Reza
    Taheri, Mohammad
    Long, Yang
    INFORMATION SCIENCES, 2021, 580 : 801 - 818
  • [50] Rethinking attribute localization for zero-shot learning
    Shuhuang CHEN
    Shiming CHEN
    GuoSen XIE
    Xiangbo SHU
    Xinge YOU
    Xuelong LI
    Science China(Information Sciences), 2024, (07) : 184 - 196