An algorithm to identify periods of establishment and obsolescence of linguistic items in a diachronic corpus

被引：0

作者：

Cunha, Evandro L. T. P. ^{[1
]}

Wichmann, Soren ^{[2
]}

机构：

[1] Univ Fed Minas Gerais, Av Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil

[2] Kazan Fed Univ, Kremlyovskaya St 18, Kazan 420000, Russia

来源：

CORPORA | 2021年 / 16卷 / 02期

关键词：

COHA; diachronic corpus linguistics; English; lexical change; neologism; obsolete word; FREQUENCY; EVOLUTION; WORDS;

D O I：

10.3366/cor.2021.0218

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

When exploring diachronic corpora, it is often beneficial for linguists to pinpoint not only the first or the last attestation dates of certain linguistic items, but also the moments in which they become more strongly established in the corpus or, conversely, the moments in which they, despite still being part of the language, become obsolete. In this paper, we propose an algorithm to assist the identification of such periods based on the frequency of items in a corpus. Our simple and generalisable algorithm can be used for the investigation of any linguistic item in any corpus which is divided into timeframes. We also demonstrate the applicability of our method using lexical data from the Corpus of Historical American English (COHA), providing case studies on the statistics and characteristics of words that appear in or disappear from this corpus in different periods.

引用

页码：205 / 236

页数：32

共 13 条

[1] LINGUISTIC TEMPORALITY IN THE DIACHRONIC PERSPECTIVE: CORPUS ASPECT
Konnova, Mariya Nikolaevna
VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2014, 13 (02): : 24 - 32
[2] Diachronic corpus and linguistic space: New methods for the analysis of language change
Tokyo Institute of Technology, Tokyo, Japan
不详
不详
Proc. - ACIS Int. Conf. Softw. Eng., Artif. Intell., Networking, Parallel/Distrib. Comput., SNPD, 1600, (381-384):
[3] STRUCTURAL MARKUP OF OFFICIAL DOCUMENTS IN DIACHRONIC LINGUISTIC CORPUS: PROBLEMS AND SOLUTIONS
Gorban, Oksana A.
Kosova, Marina, V
Sheptukhina, Elena M.
VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2021, 20 (04): : 5 - 18
[4] Phraseology in a cross-linguistic perspective: A diachronic and corpus-based account
Andersen, Gisle
CORPUS LINGUISTICS AND LINGUISTIC THEORY, 2022, 18 (02) : 365 - 389
[5] Diachronic changes in subjectivity and stance-A corpus linguistic study of Dutch news texts
Vis, Kirsten
Sanders, Jose
Spooren, Wilbert
DISCOURSE CONTEXT & MEDIA, 2012, 1 (2-3) : 95 - 102
[6] Introducing the Historical Corpus of English in Nigeria (HiCE-Nig) A database for investigating diachronic linguistic changes in Nigerian English
Unuabonah, Foluke Olayinka
Adebileje, Adebola
Oladipupo, Rotimi Olanrele
Fyanka, Bernard
Odim, Mba
Kupolati, Oluwateniola
ENGLISH TODAY, 2022, 38 (03) : 178 - 184
[7] ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
Afzal, Zubair
Pons, Ewoud
Kang, Ning
Sturkenboom, Miriam C. J. M.
Schuemie, Martijn J.
Kors, Jan A.
BMC BIOINFORMATICS, 2014, 15
[8] ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
Zubair Afzal
Ewoud Pons
Ning Kang
Miriam CJM Sturkenboom
Martijn J Schuemie
Jan A Kors
BMC Bioinformatics, 15
[9] Language Identification: A New Fast Algorithm to Identify the Language of a Text in a Multilingual Corpus
Gadri, Said
Moussaoui, Abdelouahab
Belabdelouahab-Fernini, Linda
2014 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2014, : 321 - 326
[10] Using word n-grams to identify authors and idiolects A corpus approach to a forensic linguistic problem
Wright, David
INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2017, 22 (02) : 212 - 241

← 1 2 →