Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data

被引:0
|
作者
Scheib, Julian [1 ]
Ulloa, Roberto [2 ]
Spitz, Andreas [1 ]
机构
[1] Univ Konstanz, Dept Comp Sci, Constance, Germany
[2] Univ Konstanz, Cluster Excellence Polit Inequal, Constance, Germany
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Researchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of topic-related content as a binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL & content-based features. Our results show that a small sample of annotated data is sufficient to train an effective classifier. Fine-tuning encoder-based models yields better results than in-context learning. Classifiers using both URL & content-based features perform best, while using URLs alone provides adequate results when content is unavailable.
引用
收藏
页码:162 / 176
页数:15
相关论文
共 50 条
  • [31] Classification of multi-spectral data with fine-tuning variants of representative models
    T. R. Vijaya Lakshmi
    Ch. Venkata Krishna Reddy
    Padmavathi Kora
    K. Swaraja
    K. Meenakshi
    Ch. Usha Kumari
    L. Pratap Reddy
    Multimedia Tools and Applications, 2024, 83 : 23465 - 23487
  • [32] Classification of multi-spectral data with fine-tuning variants of representative models
    Lakshmi, T. R. Vijaya
    Reddy, Ch. Venkata Krishna
    Kora, Padmavathi
    Swaraja, K.
    Meenakshi, K.
    Kumari, Ch. Usha
    Reddy, L. Pratap
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23465 - 23487
  • [33] Meta In-Context Learning: Harnessing Large Language Models for Electrical Data Classification
    Zhou, Mi
    Li, Fusheng
    Zhang, Fan
    Zheng, Junhao
    Ma, Qianli
    ENERGIES, 2023, 16 (18)
  • [34] Investigating Learning Dynamics of BERT Fine-Tuning
    Hao, Yaru
    Dong, Li
    Wei, Furu
    Xu, Ke
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 87 - 92
  • [35] Active Learning Methodology in LLMs Fine-tuning
    Ceravolo, Paolo
    Mohammadi, Fatemeh
    Tamborini, Marta Annamaria
    2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 743 - 749
  • [36] Bagging and Boosting Fine-Tuning for Ensemble Learning
    Zhao C.
    Peng R.
    Wu D.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (04): : 1728 - 1742
  • [37] Learning from models beyond fine-tuning
    Zheng, Hongling
    Shen, Li
    Tang, Anke
    Luo, Yong
    Hu, Han
    Du, Bo
    Wen, Yonggang
    Tao, Dacheng
    NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 6 - 17
  • [38] Meta-learning via Language Model In-context Tuning
    Chen, Yanda
    Zhong, Ruiqi
    Zha, Sheng
    Karypis, George
    He, He
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 719 - 730
  • [39] Iterative Forward Tuning Boosts In-Context Learning in Language Models
    Yang, Jiaxi
    Hui, Binyuan
    Yang, Min
    Wang, Bailin
    Li, Bowen
    Li, Binhua
    Huang, Fei
    Li, Yongbin
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15460 - 15473
  • [40] Forest Image Classification Based on Fine-Tuning CaffeNet
    Zhang G.
    Li Y.
    Wang H.
    Zhou H.
    Linye Kexue/Scientia Silvae Sinicae, 2020, 56 (10): : 121 - 128