Understanding Out-of-distribution:A Perspective of Data Dynamics

被引:0
|
作者
Adila, Dyah [1 ]
Kang, Dongyeop [2 ]
机构
[1] Univ Wisconsin Madison, Dept Comp Sci, Madison, WI 53706 USA
[2] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite machine learning models' success in Natural Language Processing (NLP) tasks, predictions from these models frequently fail on out-of-distribution (OOD) samples. Prior works have focused on developing state-of-the-art methods for detecting OOD. The fundamental question of how OOD samples differ from indistribution samples remains unanswered. This paper explores how data dynamics in training models can be used to understand the fundamental differences between OOD and in-distribution samples in extensive detail. We found that syntactic characteristics of the data samples that the model consistently predicts incorrectly in both OOD and in-distribution cases directly contradict each other. In addition, we observed preliminary evidence supporting the hypothesis that models are more likely to latch on trivial syntactic heuristics (e.g., overlap of words between two sentences) when making predictions on OOD samples. We hope our preliminary study accelerates the data-centric analysis on various machine learning phenomena.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [1] Understanding the Generalization of Pretrained Diffusion Models on Out-of-Distribution Data
    Ramachandran, Sai Niranjan
    Mukhopadhyay, Rudrabha
    Agarwal, Madhav
    Jawahar, C. V.
    Namboodiri, Vinay
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14767 - 14775
  • [2] The Value of Out-of-Distribution Data
    De Silva, Ashwin
    Ramesh, Rahul
    Priebe, Carey E.
    Chaudhari, Pratik
    Vogelstein, Joshua T.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [3] Provable Guarantees for Understanding Out-of-Distribution Detection
    Morteza, Peyman
    Li, Yixuan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7831 - 7840
  • [4] Understanding the Feature Norm for Out-of-Distribution Detection
    Park, Jaewoo
    Chai, Jacky Chen Long
    Yoon, Jaeho
    Teoh, Andrew Beng Jin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1557 - 1567
  • [5] Out-of-distribution generalization for learning quantum dynamics
    Caro, Matthias C.
    Huang, Hsin-Yuan
    Ezzell, Nicholas
    Gibbs, Joe
    Sornborger, Andrew T.
    Cincio, Lukasz
    Coles, Patrick J.
    Holmes, Zoe
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [6] Out-of-distribution generalization for learning quantum dynamics
    Matthias C. Caro
    Hsin-Yuan Huang
    Nicholas Ezzell
    Joe Gibbs
    Andrew T. Sornborger
    Lukasz Cincio
    Patrick J. Coles
    Zoë Holmes
    Nature Communications, 14
  • [7] RetroOOD: Understanding Out-of-Distribution Generalization in Retrosynthesis Prediction
    Yu, Yemin
    Yuan, Luotian
    Wei, Ying
    Gao, Hanyu
    Wu, Fei
    Wang, Zhihua
    Ye, Xinhai
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 374 - 382
  • [8] Understanding and Improving Feature Learning for Out-of-Distribution Generalization
    Chen, Yongqiang
    Huang, Wei
    Zhou, Kaiwen
    Bian, Yatao
    Han, Bo
    Cheng, James
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources
    Zheng, Haotian
    Wang, Qizhou
    Fang, Zhen
    Xia, Xiaobo
    Liu, Feng
    Liu, Tongliang
    Han, Bo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] LEARNING WITH OUT-OF-DISTRIBUTION DATA FOR AUDIO CLASSIFICATION
    Iqbal, Turab
    Cao, Yin
    Kong, Qiuqiang
    Plumbley, Mark D.
    Wang, Wenwu
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 636 - 640