Understanding Out-of-distribution:A Perspective of Data Dynamics

被引:0
|
作者
Adila, Dyah [1 ]
Kang, Dongyeop [2 ]
机构
[1] Univ Wisconsin Madison, Dept Comp Sci, Madison, WI 53706 USA
[2] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite machine learning models' success in Natural Language Processing (NLP) tasks, predictions from these models frequently fail on out-of-distribution (OOD) samples. Prior works have focused on developing state-of-the-art methods for detecting OOD. The fundamental question of how OOD samples differ from indistribution samples remains unanswered. This paper explores how data dynamics in training models can be used to understand the fundamental differences between OOD and in-distribution samples in extensive detail. We found that syntactic characteristics of the data samples that the model consistently predicts incorrectly in both OOD and in-distribution cases directly contradict each other. In addition, we observed preliminary evidence supporting the hypothesis that models are more likely to latch on trivial syntactic heuristics (e.g., overlap of words between two sentences) when making predictions on OOD samples. We hope our preliminary study accelerates the data-centric analysis on various machine learning phenomena.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [41] Reliable deep learning in anomalous diffusion against out-of-distribution dynamics
    Feng, Xiaochen
    Sha, Hao
    Zhang, Yongbing
    Su, Yaoquan
    Liu, Shuai
    Jiang, Yuan
    Hou, Shangguo
    Han, Sanyang
    Ji, Xiangyang
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (10): : 761 - 772
  • [42] OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization
    Ye, Nanyang
    Li, Kaican
    Bai, Haoyue
    Yu, Runpeng
    Hong, Lanqing
    Zhou, Fengwei
    Li, Zhenguo
    Zhu, Jun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7937 - 7948
  • [43] Distribution Shift Inversion for Out-of-Distribution Prediction
    Yu, Runpeng
    Liu, Songhua
    Yang, Xingyi
    Wang, Xinchao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3592 - 3602
  • [44] Continually Learning Out-of-Distribution Spatiotemporal Data for Robust Energy Forecasting
    Prabowo, Arian
    Chen, Kaixuan
    Xue, Hao
    Sethuvenkatraman, Subbu
    Salim, Flora D.
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII, 2023, 14175 : 3 - 19
  • [45] Dense Out-of-Distribution Detection by Robust Learning on Synthetic Negative Data
    Grcic, Matej
    Bevandic, Petra
    Kalafatic, Zoran
    Segvic, Sinisa
    SENSORS, 2024, 24 (04)
  • [46] Generating Perturbation-based Explanations with Robustness to Out-of-Distribution Data
    Qiu, Luyu
    Yang, Yi
    Cao, Caleb Chen
    Zheng, Yueyuan
    Ngai, Hilary
    Hsiao, Janet
    Chen, Lei
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3594 - 3605
  • [47] Towards Boosting Out-of-Distribution Detection from a Spatial Feature Importance Perspective
    Zhu, Yao
    Yan, Xiu
    Xie, Chuanlong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [48] Unlock the Potential of Counterfactually-Augmented Data in Out-Of-Distribution Generalization
    Fan, Caoyun
    Chen, Wenqing
    Tian, Jidong
    Li, Yitian
    He, Hao
    Jin, Yaohui
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [49] IMPROVING ROBUSTNESS TO OUT-OF-DISTRIBUTION DATA BY FREQUENCY-BASED AUGMENTATION
    Mukai, Koki
    Kumano, Soichiro
    Yamasaki, Toshihiko
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3116 - 3120
  • [50] DIAGNOSE: Avoiding Out-of-Distribution Data Using Submodular Information Measures
    Kothawade, Suraj
    Shrivastava, Akshit
    Iyer, Venkat
    Ramakrishnan, Ganesh
    Iyer, Rishabh
    MEDICAL IMAGE LEARNING WITH LIMITED AND NOISY DATA (MILLAND 2022), 2022, 13559 : 141 - 150