SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding

被引:1
|
作者
Bashlovkina, Vasilisa [1 ]
Matthews, Riley [1 ]
Kuang, Zhaobin [1 ]
Baumgartner, Simon [1 ]
Bendersky, Michael [1 ]
机构
[1] Google Res, New York, NY 10036 USA
关键词
language modeling; social media; transfer learning; T5; datasets; neural networks;
D O I
10.1145/3580305.3599907
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the ability of transformer-based language models (LMs) to understand social media language. Social media (SM) language is distinct from standard written language, yet existing benchmarks fall short of capturing LM performance in this socially, economically, and politically important domain. We quantify the degree to which social media language differs from conventional language and conclude that the difference is significant both in terms of token distribution and rate of linguistic shift. Next, we introduce a new benchmark for Social MedIa Language Evaluation (SMILE(sic)) that covers four SM platforms and eleven tasks. Finally, we show that learning a tokenizer and pretraining on a mix of social media and conventional language yields an LM that outperforms the best similar-sized alternative by 4.2 points on the overall SMILE(sic) score.
引用
收藏
页码:3737 / 3749
页数:13
相关论文
共 50 条
  • [21] Domain adaptation with clustered language models
    Ueberla, JP
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 807 - 810
  • [22] Unified domain-specific language for collecting and processing data of social media
    Nikolay Butakov
    Maxim Petrov
    Ksenia Mukhina
    Denis Nasonov
    Sergey Kovalchuk
    Journal of Intelligent Information Systems, 2018, 51 : 389 - 414
  • [23] Unified domain-specific language for collecting and processing data of social media
    Butakov, Nikolay
    Petrov, Maxim
    Mukhina, Ksenia
    Nasonov, Denis
    Kovalchuk, Sergey
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2018, 51 (02) : 389 - 414
  • [24] DisorBERT: A Double Domain Adaptation Model for Detecting Signs of Mental Disorders in Social Media
    Ezra Aragon, Mario
    Lopez-Monroy, A. Pastor
    Gonzalez, Luis C.
    Losada, David E.
    Montes-y-Gomez, Manuel
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15305 - 15318
  • [25] Influence of social media and corrected smile photographs in patients with malocclusion
    Karkun, Rohit
    Batra, Puneet
    Singh, Ashish Kumar
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2023, 164 (05) : 712 - 727
  • [26] Understanding the Media's Language of War
    Archibald, J.
    Guidere, M.
    MIDDLE EASTERN STUDIES, 2008, 44 (06) : 993 - 1007
  • [27] Interdisciplinarity and Indigenous Language Media: Understanding Language Choices in Zimbabwe's Media
    Mpofu, Phillip
    Salawu, Abiodun
    LANGUAGE MATTERS, 2018, 49 (01) : 45 - 64
  • [28] Understanding Social Media Lingo
    Bonsall, Lisa
    Schoenly, Lorry
    JOURNAL OF THE DERMATOLOGY NURSES ASSOCIATION, 2012, 4 (03) : 195 - 196
  • [29] Understanding social media risks
    Boucher, Christie
    AUSTRALIAN VETERINARY JOURNAL, 2018, 96 (04) : N19 - N19
  • [30] Understanding Social Media Logic
    van Dijck, Jose
    Poell, Thomas
    MEDIA AND COMMUNICATION, 2013, 1 (01): : 2 - 14