SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding

被引:1
|
作者
Bashlovkina, Vasilisa [1 ]
Matthews, Riley [1 ]
Kuang, Zhaobin [1 ]
Baumgartner, Simon [1 ]
Bendersky, Michael [1 ]
机构
[1] Google Res, New York, NY 10036 USA
关键词
language modeling; social media; transfer learning; T5; datasets; neural networks;
D O I
10.1145/3580305.3599907
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the ability of transformer-based language models (LMs) to understand social media language. Social media (SM) language is distinct from standard written language, yet existing benchmarks fall short of capturing LM performance in this socially, economically, and politically important domain. We quantify the degree to which social media language differs from conventional language and conclude that the difference is significant both in terms of token distribution and rate of linguistic shift. Next, we introduce a new benchmark for Social MedIa Language Evaluation (SMILE(sic)) that covers four SM platforms and eleven tasks. Finally, we show that learning a tokenizer and pretraining on a mix of social media and conventional language yields an LM that outperforms the best similar-sized alternative by 4.2 points on the overall SMILE(sic) score.
引用
收藏
页码:3737 / 3749
页数:13
相关论文
共 50 条
  • [1] Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding
    Jaech, Aaron
    Heck, Larry
    Ostendorf, Mari
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 690 - 694
  • [2] Your smile works: understanding smiling face emojis in social media interactions
    Yang, Kun
    VISUAL COMMUNICATION, 2024,
  • [3] Evaluation of the impact of orthodontists’ smile with malocclusions on social media and professional credibility
    Mohamad Jamal Bark
    Gil Guilherme Gasparello
    Giovani Ceron Hartmann
    Sergio Luiz Mota-Júnior
    Fabio Acciaris
    Matheus Melo Pithon
    Orlando Motohiro Tanaka
    Clinical Oral Investigations, 28
  • [4] Evaluation of the impact of orthodontists' smile with malocclusions on social media and professional credibility
    Bark, Mohamad Jamal
    Gasparello, Gil Guilherme
    Hartmann, Giovani Ceron
    Mota-Junior, Sergio Luiz
    Acciaris, Fabio
    Pithon, Matheus Melo
    Tanaka, Orlando Motohiro
    CLINICAL ORAL INVESTIGATIONS, 2024, 28 (01)
  • [5] SomEMBED: Social Media language understanding- EMBEDing contexts
    Rosso, Paolo
    Paredes, Roberto
    Tattle, Mariona
    Antonia Marti, M.
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2016, (57): : 143 - 146
  • [6] Domain Adaptation for Stance Detection towards Unseen Target on Social Media
    Deng, Ruofan
    Pan, Li
    Clavel, Chloe
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2022,
  • [7] Understanding Sarcoidosis Using Large Language Models and Social Media Data
    Xi, Nan Miles
    Ji, Hong-Long
    Wang, Lin
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2024,
  • [8] Zero-shot Domain Adaptation with Inference Relation Paths for Spoken Language Understanding
    Li, Sixia
    Dang, Jianwu
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1056 - 1061
  • [9] Introducing the Social Media Literacy (SMILE) model with the case of the positivity bias on social media
    Schreurs, Lara
    Vandenbosch, Laura
    JOURNAL OF CHILDREN AND MEDIA, 2021, 15 (03) : 320 - 337
  • [10] Ensemble-based domain adaptation on social media posts for irony detection
    Saroj, Anita
    Pal, Sukomal
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23249 - 23268