Detecting Errors with Zero-Shot Learning

被引:2
|
作者
Wu, Xiaoyu [1 ,2 ]
Wang, Ning [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
基金
国家重点研发计划;
关键词
error detection; zero-shot learning; self-attention mechanism; KNOWLEDGE-BASE;
D O I
10.3390/e24070936
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Error detection is a critical step in data cleaning. Most traditional error detection methods are based on rules and external information with high cost, especially when dealing with large-scaled data. Recently, with the advances of deep learning, some researchers focus their attention on learning the semantic distribution of data for error detection; however, the low error rate in real datasets makes it hard to collect negative samples for training supervised deep learning models. Most of the existing deep-learning-based error detection algorithms solve the class imbalance problem by data augmentation. Due to the inadequate sampling of negative samples, the features learned by those methods may be biased. In this paper, we propose an AEGAN (Auto-Encoder Generative Adversarial Network)-based deep learning model named SAT-GAN (Self-Attention Generative Adversarial Network) to detect errors in relational datasets. Combining the self-attention mechanism with the pre-trained language model, our model can capture semantic features of the dataset, specifically the functional dependency between attributes, so that no rules or constraints are needed for SAT-GAN to identify inconsistent data. For the lack of negative samples, we propose to train our model via zero-shot learning. As a clean-data tailored model, SAT-GAN tries to recognize error data as outliers by learning the latent features of clean data. In our evaluation, SAT-GAN achieves an average F-1-score of 0.95 on five datasets, which yields at least 46.2% F-1-score improvement over rule-based methods and outperforms state-of-the-art deep learning approaches in the absence of rules and negative samples.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Variational Disentangle Zero-Shot Learning
    Su, Jie
    Wan, Jinhao
    Li, Taotao
    Li, Xiong
    Ye, Yuheng
    MATHEMATICS, 2023, 11 (16)
  • [32] Prototype rectification for zero-shot learning
    Yi, Yuanyuan
    Zeng, Guolei
    Ren, Bocheng
    Yang, Laurence T.
    Chai, Bin
    Li, Yuxin
    PATTERN RECOGNITION, 2024, 156
  • [33] Zero-Shot Program Representation Learning
    Cui, Nan
    Jiang, Yuze
    Gu, Xiaodong
    Shen, Beijun
    arXiv, 2022,
  • [34] Zero-shot Learning With Fuzzy Attribute
    Liu, Chongwen
    Shang, Zhaowei
    Tang, Yuan Yan
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), 2017, : 277 - 282
  • [35] Landmark Selection for Zero-shot Learning
    Guo, Yuchen
    Ding, Guiguang
    Han, Jungong
    Yan, Chenggang
    Zhang, Jiyong
    Dai, Qionghai
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2435 - 2441
  • [36] Recent Advances in Zero-Shot Learning
    Lan Hong
    Fang Zhiyu
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (05) : 1188 - 1200
  • [37] Learning Attention Propagation for Compositional Zero-Shot Learning
    Khan, Muhammad Gul Zain Ali
    Naeem, Muhammad Ferjad
    Van Gool, Luc
    Pagani, A.
    Stricker, Didier
    Afzal, Muhammad Zeshan
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3817 - 3826
  • [38] Learning MLatent Representations for Generalized Zero-Shot Learning
    Ye, Yalan
    Pan, Tongjie
    Luo, Tonghoujun
    Li, Jingjing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2252 - 2265
  • [39] Meta-Learning for Generalized Zero-Shot Learning
    Verma, Vinay Kumar
    Brahma, Dhanajit
    Rai, Piyush
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6062 - 6069
  • [40] Learning the Compositional Domains for Generalized Zero-shot Learning
    Dong, Hanze
    Fu, Yanwei
    Hwang, Sung Ju
    Sigal, Leonid
    Xue, Xiangyang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 221