From tweets to trends: analyzing sociolinguistic variation and change using the Twitter Corpus of English in Hong Kong (TCOEHK)

被引:2
|
作者
Gonzales, Wilkinson Daniel Wong [1 ]
机构
[1] Chinese Univ Hong Kong, Dept English, Hong Kong, Peoples R China
关键词
English in Hong Kong; language variation and change; regional variation; Bayesian and deep learning methods; language and social media; IDENTITY; MODELS;
D O I
10.1080/13488678.2023.2251771
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This article presents the Twitter Corpus of English in Hong Kong (TCOEHK): a 123-million-word corpus derived from sampling tweets across the 18 districts and three geographical (macro-)regions of Hong Kong from 2010 to 2022. It introduces the corpus and demonstrates its utility by examining four linguistic variables found in English in Hong Kong (EngHK) and the dominant variety Hong Kong English (HKE): tense marking, '-ize/-ise' suffix use, adverb syntactic position, and copula (non-)use. It explores their relationship with intralinguistic, stylistic (e.g. formality), and extralinguistic factors (e.g. region, year, affect). The findings show that the distribution of variants in all four variables (e.g. rates of -ize use) is similar to the patterns identified in prior HKE work. In addition to confirming previous research, the results also reveal how intralinguistic, stylistic, and extralinguistic factors can each influence the distribution of variants differently depending on the variable studied, highlighting the complex and ever-changing nature of EngHK. The availability of social metadata and the large size of the TCOEHK make it viable for examining (socio)linguistic variation and changes in contemporary (Twitter-style) EngHK, as well as potential regional and social sub-varieties/styles within EngHK. It promises to advance research on variation and change in EngHK.
引用
收藏
页码:1 / 24
页数:24
相关论文
共 10 条