Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data

被引:0
|
作者
Scheib, Julian [1 ]
Ulloa, Roberto [2 ]
Spitz, Andreas [1 ]
机构
[1] Univ Konstanz, Dept Comp Sci, Constance, Germany
[2] Univ Konstanz, Cluster Excellence Polit Inequal, Constance, Germany
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Researchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of topic-related content as a binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL & content-based features. Our results show that a small sample of annotated data is sufficient to train an effective classifier. Fine-tuning encoder-based models yields better results than in-context learning. Classifiers using both URL & content-based features perform best, while using URLs alone provides adequate results when content is unavailable.
引用
收藏
页码:162 / 176
页数:15
相关论文
共 50 条
  • [41] Patent classification by fine-tuning BERT language model
    Lee, Jieh-Sheng
    Hsiang, Jieh
    WORLD PATENT INFORMATION, 2020, 61
  • [42] Fine-Tuning Data Structures for Query Processing
    Shaikhha, Amir
    Kelepeshis, Marios
    Ghorbani, Mahdi
    PROCEEDINGS OF THE 21ST ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO 2023, 2023, : 149 - 161
  • [43] Word Polarity Classification by Fine-tuning of Query Words
    Ying, Cao
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 3836 - 3841
  • [44] Universal Language Model Fine-tuning for Text Classification
    Howard, Jeremy
    Ruder, Sebastian
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 328 - 339
  • [45] Improved transfer learning of CNN through fine-tuning and classifier ensemble for scene classification
    Thirumaladevi, S.
    Swamy, K. Veera
    Sailaja, M.
    SOFT COMPUTING, 2022, 26 (12) : 5617 - 5636
  • [46] Fine-Tuning of Distil-BERT for Continual Learning in Text Classification: An Experimental Analysis
    Shah, Sahar
    Manzoni, Sara Lucia
    Zaman, Farooq
    Es Sabery, Fatima
    Epifania, Francesco
    Zoppis, Italo Francesco
    IEEE ACCESS, 2024, 12 : 104964 - 104982
  • [47] Aerial Scene Classification through Fine-Tuning with Adaptive Learning Rates and Label Smoothing
    Petrovska, Biserka
    Atanasova-Pacemska, Tatjana
    Corizzo, Roberto
    Mignone, Paolo
    Lameski, Petre
    Zdravevski, Eftim
    APPLIED SCIENCES-BASEL, 2020, 10 (17):
  • [48] News Topic Classification Base on Fine-Tuning of ChatGLM3-6B using NEFTune and LORA
    Yang, Liziqiu
    Huang, Yanhao
    Tan, Cong
    Wang, Sen
    PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 521 - 525
  • [49] P2 receptor web:: Complexity and fine-tuning
    Volonte, Cinzia
    Amadio, Susanna
    D'Ambrosi, Nadia
    Colpi, Monica
    Burnstock, Geoffrey
    PHARMACOLOGY & THERAPEUTICS, 2006, 112 (01) : 264 - 280
  • [50] How fine can fine-tuning be? Learning efficient language models
    Radiya-Dixit, Evani
    Wang, Xin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2435 - 2442