Assessing In-context Learning and Fine-tuning for Topic Classification of German Web Data

被引：0

作者：

Scheib, Julian ^{[1
]}

Ulloa, Roberto ^{[2
]}

Spitz, Andreas ^{[1
]}

机构：

[1] Univ Konstanz, Dept Comp Sci, Constance, Germany

[2] Univ Konstanz, Cluster Excellence Polit Inequal, Constance, Germany

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP | 2024年

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Researchers in the political and social sciences often rely on classification models to analyze trends in information consumption by examining browsing histories of millions of webpages. Automated scalable methods are necessary due to the impracticality of manual labeling. In this paper, we model the detection of topic-related content as a binary classification task and compare the accuracy of fine-tuned pre-trained encoder models against in-context learning strategies. Using only a few hundred annotated data points per topic, we detect content related to three German policies in a database of scraped webpages. We compare multilingual and monolingual models, as well as zero and few-shot approaches, and investigate the impact of negative sampling strategies and the combination of URL & content-based features. Our results show that a small sample of annotated data is sufficient to train an effective classifier. Fine-tuning encoder-based models yields better results than in-context learning. Classifiers using both URL & content-based features perform best, while using URLs alone provides adequate results when content is unavailable.

引用

页码：162 / 176

页数：15

共 50 条

[31] Classification of multi-spectral data with fine-tuning variants of representative models
T. R. Vijaya Lakshmi
Ch. Venkata Krishna Reddy
Padmavathi Kora
K. Swaraja
K. Meenakshi
Ch. Usha Kumari
L. Pratap Reddy
Multimedia Tools and Applications, 2024, 83 : 23465 - 23487
[32] Classification of multi-spectral data with fine-tuning variants of representative models
Lakshmi, T. R. Vijaya
Reddy, Ch. Venkata Krishna
Kora, Padmavathi
Swaraja, K.
Meenakshi, K.
Kumari, Ch. Usha
Reddy, L. Pratap
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23465 - 23487
[33] Meta In-Context Learning: Harnessing Large Language Models for Electrical Data Classification
Zhou, Mi
Li, Fusheng
Zhang, Fan
Zheng, Junhao
Ma, Qianli
ENERGIES, 2023, 16 (18)
[34] Investigating Learning Dynamics of BERT Fine-Tuning
Hao, Yaru
Dong, Li
Wei, Furu
Xu, Ke
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 87 - 92
[35] Active Learning Methodology in LLMs Fine-tuning
Ceravolo, Paolo
Mohammadi, Fatemeh
Tamborini, Marta Annamaria
2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 743 - 749
[36] Bagging and Boosting Fine-Tuning for Ensemble Learning
Zhao C.
Peng R.
Wu D.
IEEE Transactions on Artificial Intelligence, 2024, 5 (04): : 1728 - 1742
[37] Learning from models beyond fine-tuning
Zheng, Hongling
Shen, Li
Tang, Anke
Luo, Yong
Hu, Han
Du, Bo
Wen, Yonggang
Tao, Dacheng
NATURE MACHINE INTELLIGENCE, 2025, 7 (01) : 6 - 17
[38] Meta-learning via Language Model In-context Tuning
Chen, Yanda
Zhong, Ruiqi
Zha, Sheng
Karypis, George
He, He
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 719 - 730
[39] Iterative Forward Tuning Boosts In-Context Learning in Language Models
Yang, Jiaxi
Hui, Binyuan
Yang, Min
Wang, Bailin
Li, Bowen
Li, Binhua
Huang, Fei
Li, Yongbin
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15460 - 15473
[40] Forest Image Classification Based on Fine-Tuning CaffeNet
Zhang G.
Li Y.
Wang H.
Zhou H.
Linye Kexue/Scientia Silvae Sinicae, 2020, 56 (10): : 121 - 128

← 1 2 3 4 5 →