DEEPLENS: Interactive Out-of-distribution Data Detection in NLP Models

被引:1
|
作者
Song, Da [1 ]
Wang, Zhijie [1 ]
Huang, Yuheng [1 ]
Ma, Lei [1 ,2 ]
Zhang, Tianyi [3 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Univ Tokyo, Tokyo, Japan
[3] Purdue Univ, W Lafayette, IN USA
基金
加拿大自然科学与工程研究理事会;
关键词
Interactive Visualization; Out-of-distribution Detection; Machine Learning; NLP;
D O I
10.1145/3544548.3580741
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DEEPLENS, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DEEPLENS that has no interaction or visualization support.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] A Survey on Out-of-Distribution Evaluation of Neural NLP Models
    Li, Xinzhe
    Liu, Ming
    Gao, Shang
    Buntine, Wray
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6683 - 6691
  • [2] Latent Transformer Models for out-of-distribution detection
    Graham, Mark S.
    Tudosiu, Petru-Daniel
    Wright, Paul
    Pinaya, Walter Hugo Lopez
    Teikari, Petteri
    Patel, Ashay
    U-King-Im, Jean-Marie
    Mah, Yee H.
    Teo, James T.
    Jager, Hans Rolf
    Werring, David
    Rees, Geraint
    Nachev, Parashkev
    Ourselin, Sebastien
    Cardoso, M. Jorge
    MEDICAL IMAGE ANALYSIS, 2023, 90
  • [3] Language Models as Reasoners for Out-of-Distribution Detection
    Kirchheim, Konstantin
    Ortmeier, Frank
    COMPUTER SAFETY, RELIABILITY, AND SECURITY. SAFECOMP 2024 WORKSHOPS, 2024, 14989 : 379 - 390
  • [4] Deep Hybrid Models for Out-of-Distribution Detection
    Cao, Senqi
    Zhang, Zhongfei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4723 - 4733
  • [5] Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources
    Zheng, Haotian
    Wang, Qizhou
    Fang, Zhen
    Xia, Xiaobo
    Liu, Feng
    Liu, Tongliang
    Han, Bo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Diffusion models for out-of-distribution detection in digital pathology
    Linmans, Jasper
    Raya, Gabriel
    van der Laak, Jeroen
    Litjens, Geert
    MEDICAL IMAGE ANALYSIS, 2024, 93
  • [7] Diffusion models for out-of-distribution detection in digital pathology
    Linmans, Jasper
    Raya, Gabriel
    van der Laak, Jeroen
    Litjens, Geert
    Medical Image Analysis, 2024, 93
  • [8] An Object Detection Model Robust to Out-of-Distribution Data
    Park, Ho-rim
    Hwang, Kyu-hong
    Ha, Young-guk
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 275 - 278
  • [9] Data Invariants to Understand Unsupervised Out-of-Distribution Detection
    Doorenbos, Lars
    Sznitman, Raphael
    Marquez-Neila, Pablo
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 133 - 150
  • [10] On the Learnability of Out-of-distribution Detection
    Fang, Zhen
    Li, Yixuan
    Liu, Feng
    Han, Bo
    Lu, Jie
    Journal of Machine Learning Research, 2024, 25