Towards Reliable Drift Detection and Explanation in Text Data

被引:0
|
作者
Feldhans, Robert [1 ]
Hammer, Barbara [1 ]
机构
[1] Bielefeld Univ, Bielefeld, Germany
关键词
Drift Explanation; Text Data; Transformer; Visualization;
D O I
10.1007/978-3-031-77731-8_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When delivered to the market, machine learning models face new data which are possibly subject to novel characteristics - a phenomenon known as concept drift. As this might lead to performance degradation, it is necessary to detect such drift and, if required, adapt the model accordingly. While a variety of drift detection and adaptation methods exists for standard vectorial data, a suitable treatment of text data is less researched. In this work we present a novel approach which detects and explains drift in text data based on their representation via transformer embeddings. In a nutshell, the method generates suitable statistical features from the original distribution and the possibly shifted variation. Based on these representations, drift scores can be assigned to individual data points, allowing a visualization and human-readable characterization of the type of drift. We demonstrate the approach's effectiveness in reliably detecting drift in several experiments.
引用
收藏
页码:301 / 312
页数:12
相关论文
共 50 条
  • [31] Malware Detection by Text and Data Mining
    Sundarkumar, G. Ganesh
    Ravi, Vadlamani
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 566 - 571
  • [32] Towards Automatic Detection and Explanation of Hate Speech and Offensive Language
    Dorris, Wyatt
    Hu, Ruijia
    Vishwamitra, Nishant
    Luo, Feng
    Costello, Matthew
    PROCEEDINGS OF THE SIXTH INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS (IWSPA'20), 2020, : 23 - 29
  • [33] Change point detection in text data
    Preis A.
    Schwaar S.
    Behaviormetrika, 2024, 51 (1) : 477 - 496
  • [34] Semantic detection for tabular data in text
    Alrashed, SA
    Gray, WA
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS: INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS: I, 2003, : 209 - 214
  • [35] Predicting Disk Replacement towards Reliable Data Centers
    Botezatu, Mirela
    Giurgiu, Ioana
    Bogojeska, Jasmina
    Wiesmann, Dorothea
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 39 - 48
  • [36] Reliable Change Point Detection for ACGH data
    Eliades, Charalambos
    Papadopoulos, Harris
    13TH SYMPOSIUM ON CONFORMAL AND PROBABILISTIC PREDICTION WITH APPLICATIONS, 2024, 230 : 387 - 405
  • [37] Towards a Standard Bangla PhotoOCR: Text Detection and Localization
    Islam, Md Zahidul
    Mondal, Amit Kumar
    2014 17TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2014, : 198 - 203
  • [38] Text and explanation Animals Act
    不详
    TIJDSCHRIFT VOOR DIERGENEESKUNDE, 2012, 137 (02) : 87 - 87
  • [39] Towards Better Hierarchical Text Classification with Data Generation
    Wang, Yue
    Qiao, Dan
    Li, Juntao
    Chang, Jinxiong
    Zhang, Qishen
    Liu, Zhongyi
    Zhang, Guannan
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 7722 - 7739
  • [40] Towards A Reliable Ground-Truth For Biased Language Detection
    Spinde, Timo
    Krieger, David
    Plank, Manuel
    Gipp, Bela
    2021 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2021), 2021, : 324 - 325