DEEPLENS: Interactive Out-of-distribution Data Detection in NLP Models

被引:1
|
作者
Song, Da [1 ]
Wang, Zhijie [1 ]
Huang, Yuheng [1 ]
Ma, Lei [1 ,2 ]
Zhang, Tianyi [3 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Univ Tokyo, Tokyo, Japan
[3] Purdue Univ, W Lafayette, IN USA
基金
加拿大自然科学与工程研究理事会;
关键词
Interactive Visualization; Out-of-distribution Detection; Machine Learning; NLP;
D O I
10.1145/3544548.3580741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DEEPLENS, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DEEPLENS that has no interaction or visualization support.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Dense Out-of-Distribution Detection by Robust Learning on Synthetic Negative Data
    Grcic, Matej
    Bevandic, Petra
    Kalafatic, Zoran
    Segvic, Sinisa
    SENSORS, 2024, 24 (04)
  • [42] A Novel Statistical Measure for Out-of-Distribution Detection in Data Quality Assurance
    Ouyang, Tinghui
    Echizen, Isao
    Seo, Yoshiki
    PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 458 - 464
  • [43] Recognition Models for Distribution and Out-of-Distribution of Human Activities
    Staab, Sergio
    Krissel, Simon
    Luderschmidt, Johannes
    Martin, Ludger
    2022 18TH INTERNATIONAL CONFERENCE ON WIRELESS AND MOBILE COMPUTING, NETWORKING AND COMMUNICATIONS (WIMOB), 2022,
  • [44] Provably Adversarially Robust Detection of Out-of-Distribution Data (Almost) for Free
    Meinke, Alexander
    Bitterwolf, Julian
    Hein, Matthias
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [45] In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation
    Bitterwolf, Julian
    Mueller, Maximilian
    Hein, Matthias
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [46] Out-of-Distribution Data Generation for Fault Detection and Diagnosis in Industrial Systems
    Kafunah, Jefkine
    Verma, Priyanka
    Ali, Muhammad Intizar
    Breslin, John G.
    IEEE ACCESS, 2023, 11 : 135061 - 135073
  • [47] Unmasking the chameleons: A benchmark for out-of-distribution detection in medical tabular data
    Azizmalayeri, Mohammad
    Abu-Hanna, Ameen
    Cina, Giovanni
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 195
  • [48] PAC-Based Formal Verification for Out-of-Distribution Data Detection
    Prashant, Mohit
    Easwaran, Arvind
    2022 6TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY, ICSRS, 2022, : 300 - 309
  • [49] On the Adversarial Robustness of Out-of-distribution Generalization Models
    Zou, Xin
    Liu, Weiwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Towards In-Distribution Compatible Out-of-Distribution Detection
    Wu, Boxi
    Jiang, Jie
    Ren, Haidong
    Du, Zifan
    Wang, Wenxiao
    Li, Zhifeng
    Cai, Deng
    He, Xiaofei
    Lin, Binbin
    Liu, Wei
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10333 - 10341