Understanding Out-of-distribution:A Perspective of Data Dynamics

被引:0
|
作者
Adila, Dyah [1 ]
Kang, Dongyeop [2 ]
机构
[1] Univ Wisconsin Madison, Dept Comp Sci, Madison, WI 53706 USA
[2] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite machine learning models' success in Natural Language Processing (NLP) tasks, predictions from these models frequently fail on out-of-distribution (OOD) samples. Prior works have focused on developing state-of-the-art methods for detecting OOD. The fundamental question of how OOD samples differ from indistribution samples remains unanswered. This paper explores how data dynamics in training models can be used to understand the fundamental differences between OOD and in-distribution samples in extensive detail. We found that syntactic characteristics of the data samples that the model consistently predicts incorrectly in both OOD and in-distribution cases directly contradict each other. In addition, we observed preliminary evidence supporting the hypothesis that models are more likely to latch on trivial syntactic heuristics (e.g., overlap of words between two sentences) when making predictions on OOD samples. We hope our preliminary study accelerates the data-centric analysis on various machine learning phenomena.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [21] Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective
    Fang, Kun
    Tao, Qinghua
    Huang, Xiaolin
    Yang, Jie
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 6107 - 6126
  • [22] Multi-Class Data Description for Out-of-distribution Detection
    Lee, Dongha
    Yu, Sehun
    Yu, Hwanjo
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1362 - 1370
  • [23] Coverage-Guaranteed Prediction Sets for Out-of-Distribution Data
    Zou, Xin
    Liu, Weiwei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 17263 - 17270
  • [24] Why Normalizing Flows Fail to Detect Out-of-Distribution Data
    Kirichenko, Polina
    Izmailov, Pavel
    Wilson, Andrew Gordon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [25] MixOOD: Improving Out-of-distribution Detection with Enhanced Data Mixup
    Yang, Taocun
    Huang, Yaping
    Xie, Yanlin
    Liu, Junbo
    Wang, Shengchun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (05)
  • [26] Targeted Data-driven Regularization for Out-of-Distribution Generalization
    Kamani, Mohammad Mahdi
    Farhang, Sadegh
    Mahdavi, Mehrdad
    Wang, James Z.
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 882 - 891
  • [27] On the Learnability of Out-of-distribution Detection
    Fang, Zhen
    Li, Yixuan
    Liu, Feng
    Han, Bo
    Lu, Jie
    Journal of Machine Learning Research, 2024, 25
  • [28] An Efficient Data Augmentation Network for Out-of-Distribution Image Detection
    Lin, Cheng-Hung
    Lin, Cheng-Shian
    Chou, Po-Yung
    Hsu, Chen-Chien
    IEEE ACCESS, 2021, 9 : 35313 - 35323
  • [29] Rethinking Out-of-Distribution Detection From a Human-Centric Perspective
    Zhu, Yao
    Chen, Yuefeng
    Li, Xiaodan
    Zhang, Rong
    Xue, Hui
    Tian, Xiang
    Jiang, Rongxin
    Zheng, Bolun
    Chen, Yaowu
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4633 - 4650
  • [30] Weakly Supervised Semantic Segmentation using Out-of-Distribution Data
    Lee, Jungbeom
    Oh, Seong Joon
    Yun, Sangdoo
    Choe, Junsuk
    Kim, Eunji
    Yoon, Sungroh
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16876 - 16885