DEEPLENS: Interactive Out-of-distribution Data Detection in NLP Models

被引：1

作者：

Song, Da ^{[1
]}

Wang, Zhijie ^{[1
]}

Huang, Yuheng ^{[1
]}

Ma, Lei ^{[1
,2
]}

Zhang, Tianyi ^{[3
]}

机构：

[1] Univ Alberta, Edmonton, AB, Canada

[2] Univ Tokyo, Tokyo, Japan

[3] Purdue Univ, W Lafayette, IN USA

来源：

PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2023 | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Interactive Visualization; Out-of-distribution Detection; Machine Learning; NLP;

D O I：

10.1145/3544548.3580741

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine Learning (ML) has been widely used in Natural Language Processing (NLP) applications. A fundamental assumption in ML is that training data and real-world data should follow a similar distribution. However, a deployed ML model may suffer from out-of-distribution (OOD) issues due to distribution shifts in the real-world data. Though many algorithms have been proposed to detect OOD data from text corpora, there is still a lack of interactive tool support for ML developers. In this work, we propose DEEPLENS, an interactive system that helps users detect and explore OOD issues in massive text corpora. Users can efficiently explore different OOD types in DeepLens with the help of a text clustering method. Users can also dig into a specific text by inspecting salient words highlighted through neuron activation analysis. In a within-subjects user study with 24 participants, participants using DeepLens were able to find nearly twice more types of OOD issues accurately with 22% more confidence compared with a variant of DEEPLENS that has no interaction or visualization support.

引用

页数：17

共 50 条

[1] A Survey on Out-of-Distribution Evaluation of Neural NLP Models
Li, Xinzhe
Liu, Ming
Gao, Shang
Buntine, Wray
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6683 - 6691
[2] Latent Transformer Models for out-of-distribution detection
Graham, Mark S.
Tudosiu, Petru-Daniel
Wright, Paul
Pinaya, Walter Hugo Lopez
Teikari, Petteri
Patel, Ashay
U-King-Im, Jean-Marie
Mah, Yee H.
Teo, James T.
Jager, Hans Rolf
Werring, David
Rees, Geraint
Nachev, Parashkev
Ourselin, Sebastien
Cardoso, M. Jorge
MEDICAL IMAGE ANALYSIS, 2023, 90
[3] Language Models as Reasoners for Out-of-Distribution Detection
Kirchheim, Konstantin
Ortmeier, Frank
COMPUTER SAFETY, RELIABILITY, AND SECURITY. SAFECOMP 2024 WORKSHOPS, 2024, 14989 : 379 - 390
[4] Deep Hybrid Models for Out-of-Distribution Detection
Cao, Senqi
Zhang, Zhongfei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4723 - 4733
[5] Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources
Zheng, Haotian
Wang, Qizhou
Fang, Zhen
Xia, Xiaobo
Liu, Feng
Liu, Tongliang
Han, Bo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Diffusion models for out-of-distribution detection in digital pathology
Linmans, Jasper
Raya, Gabriel
van der Laak, Jeroen
Litjens, Geert
MEDICAL IMAGE ANALYSIS, 2024, 93
[7] Diffusion models for out-of-distribution detection in digital pathology
Linmans, Jasper
Raya, Gabriel
van der Laak, Jeroen
Litjens, Geert
Medical Image Analysis, 2024, 93
[8] An Object Detection Model Robust to Out-of-Distribution Data
Park, Ho-rim
Hwang, Kyu-hong
Ha, Young-guk
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2021), 2021, : 275 - 278
[9] Data Invariants to Understand Unsupervised Out-of-Distribution Detection
Doorenbos, Lars
Sznitman, Raphael
Marquez-Neila, Pablo
COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 133 - 150
[10] On the Learnability of Out-of-distribution Detection
Fang, Zhen
Li, Yixuan
Liu, Feng
Han, Bo
Lu, Jie
Journal of Machine Learning Research, 2024, 25

← 1 2 3 4 5 →