An algorithm to identify periods of establishment and obsolescence of linguistic items in a diachronic corpus

被引:0
|
作者
Cunha, Evandro L. T. P. [1 ]
Wichmann, Soren [2 ]
机构
[1] Univ Fed Minas Gerais, Av Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil
[2] Kazan Fed Univ, Kremlyovskaya St 18, Kazan 420000, Russia
关键词
COHA; diachronic corpus linguistics; English; lexical change; neologism; obsolete word; FREQUENCY; EVOLUTION; WORDS;
D O I
10.3366/cor.2021.0218
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
When exploring diachronic corpora, it is often beneficial for linguists to pinpoint not only the first or the last attestation dates of certain linguistic items, but also the moments in which they become more strongly established in the corpus or, conversely, the moments in which they, despite still being part of the language, become obsolete. In this paper, we propose an algorithm to assist the identification of such periods based on the frequency of items in a corpus. Our simple and generalisable algorithm can be used for the investigation of any linguistic item in any corpus which is divided into timeframes. We also demonstrate the applicability of our method using lexical data from the Corpus of Historical American English (COHA), providing case studies on the statistics and characteristics of words that appear in or disappear from this corpus in different periods.
引用
收藏
页码:205 / 236
页数:32
相关论文
共 13 条
  • [1] LINGUISTIC TEMPORALITY IN THE DIACHRONIC PERSPECTIVE: CORPUS ASPECT
    Konnova, Mariya Nikolaevna
    VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2014, 13 (02): : 24 - 32
  • [2] Diachronic corpus and linguistic space: New methods for the analysis of language change
    Tokyo Institute of Technology, Tokyo, Japan
    不详
    不详
    Proc. - ACIS Int. Conf. Softw. Eng., Artif. Intell., Networking, Parallel/Distrib. Comput., SNPD, 1600, (381-384):
  • [3] STRUCTURAL MARKUP OF OFFICIAL DOCUMENTS IN DIACHRONIC LINGUISTIC CORPUS: PROBLEMS AND SOLUTIONS
    Gorban, Oksana A.
    Kosova, Marina, V
    Sheptukhina, Elena M.
    VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2021, 20 (04): : 5 - 18
  • [4] Phraseology in a cross-linguistic perspective: A diachronic and corpus-based account
    Andersen, Gisle
    CORPUS LINGUISTICS AND LINGUISTIC THEORY, 2022, 18 (02) : 365 - 389
  • [5] Diachronic changes in subjectivity and stance-A corpus linguistic study of Dutch news texts
    Vis, Kirsten
    Sanders, Jose
    Spooren, Wilbert
    DISCOURSE CONTEXT & MEDIA, 2012, 1 (2-3) : 95 - 102
  • [6] Introducing the Historical Corpus of English in Nigeria (HiCE-Nig) A database for investigating diachronic linguistic changes in Nigerian English
    Unuabonah, Foluke Olayinka
    Adebileje, Adebola
    Oladipupo, Rotimi Olanrele
    Fyanka, Bernard
    Odim, Mba
    Kupolati, Oluwateniola
    ENGLISH TODAY, 2022, 38 (03) : 178 - 184
  • [7] ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
    Afzal, Zubair
    Pons, Ewoud
    Kang, Ning
    Sturkenboom, Miriam C. J. M.
    Schuemie, Martijn J.
    Kors, Jan A.
    BMC BIOINFORMATICS, 2014, 15
  • [8] ContextD: an algorithm to identify contextual properties of medical terms in a Dutch clinical corpus
    Zubair Afzal
    Ewoud Pons
    Ning Kang
    Miriam CJM Sturkenboom
    Martijn J Schuemie
    Jan A Kors
    BMC Bioinformatics, 15
  • [9] Language Identification: A New Fast Algorithm to Identify the Language of a Text in a Multilingual Corpus
    Gadri, Said
    Moussaoui, Abdelouahab
    Belabdelouahab-Fernini, Linda
    2014 INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS (ICMCS), 2014, : 321 - 326
  • [10] Using word n-grams to identify authors and idiolects A corpus approach to a forensic linguistic problem
    Wright, David
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2017, 22 (02) : 212 - 241