Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data

被引:0
|
作者
Scheib, Julian [1 ]
Ulloa, Roberto [2 ]
Spitz, Andreas [1 ]
机构
[1] Univ Konstanz, Dept Comp Sci, Constance, Germany
[2] Univ Konstanz, Cluster Excellence Polit Inequal, Constance, Germany
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Researchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of topic-related content as a binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL & content-based features. Our results show that a small sample of annotated data is sufficient to train an effective classifier. Fine-tuning encoder-based models yields better results than in-context learning. Classifiers using both URL & content-based features perform best, while using URLs alone provides adequate results when content is unavailable.
引用
收藏
页码:162 / 176
页数:15
相关论文
共 50 条
  • [1] ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning
    She, Jingyuan Selena
    Potts, Christopher
    Bowman, Samuel R.
    Geiger, Atticus
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1803 - 1821
  • [2] Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation
    Mosbach, Marius
    Pimentel, Tiago
    Ravfogel, Shauli
    Klakow, Dietrich
    Elazar, Yanai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12284 - 12314
  • [3] Strategic Integration of Context for Fine-Tuning Topic Model Performance
    Dardouillet, Pierre
    Salamatian, Kave
    Verjus, Herve
    Loukil, Faiza
    Telisson, David
    Le Van, Olivier
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 366 - 375
  • [4] Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
    Liu, Haokun
    Tam, Derek
    Muqeeth, Mohammed
    Mohta, Jay
    Huang, Tenghao
    Raffel, Mohit Bansal Colin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [5] Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing
    Chen, Yongrui
    Zhang, Shenyu
    Qi, Guilin
    Guo, Xinnan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Data Fine-Tuning
    Chhabra, Saheb
    Majumdar, Puspita
    Vatsa, Mayank
    Singh, Richa
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8223 - 8230
  • [7] Extreme Fine-tuning: A Novel and Fast Fine-tuning Approach for Text Classification
    Jiaramaneepinit, Boonnithi
    Chay-intr, Thodsaporn
    Funakoshi, Kotaro
    Okumura, Manabu
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 368 - 379
  • [8] Gemstone classification using ConvNet with transfer learning and fine-tuning
    Freire, Willian M.
    Amaral, Aline M. M. M.
    Costa, Yandre M. G.
    2022 29TH INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP), 2022,
  • [9] COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
    Pan, Jing
    Wu, Jian
    Gaur, Yashesh
    Sivasankaran, Sunit
    Chen, Zhuo
    Liu, Shujie
    Li, Jinyu
    INTERSPEECH 2024, 2024, : 4164 - 4168
  • [10] Fine-Tuning DARTS for Image Classification
    Tanveer, Muhammad Suhaib
    Khan, Muhammad Umar Karim
    Kyung, Chong-Min
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4789 - 4796