Natural Language Processing for Ancient Greek

被引：0

作者：

Stopponi, Silvia ^{[1
]}

Pedrazzini, Nilo ^{[2
]}

Peels-Matthey, Saskia ^{[1
]}

McGillivray, Barbara ^{[3
]}

Nissim, Malvina ^{[1
]}

机构：

[1] Univ Groningen, Ctr Language & Cognit Groningen CLCG, POB 716, NL-9700 AS Groningen, Netherlands

[2] Alan Turing Inst, British Lib, London, England

[3] Kings Coll London, Dept Digital Humanities, Strand Campus, London, England

来源：

DIACHRONICA | 2024年 / 41卷 / 03期

关键词：

Ancient Greek; semantic change; computational linguistics; language models; Natural Language Processing; word embeddings; semantic space;

D O I：

10.1075/dia.23013.sto

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Computational methods have produced meaningful and usable results to study word semantics, including semantic change. These methods, belonging to the field of Natural Language Processing, have recently been applied to ancient languages; in particular, language modelling has been applied to Ancient Greek, the language on which we focus. In this contribution we explain how vector representations can be computed from word co-occurrences in a corpus and can be used to locate words in a semantic space, and what kind of semantic information can be extracted from language models. We compare three different kinds of language models that can be used to study Ancient Greek semantics: a count-based model, a word embedding model and a syntactic embedding model; and we show examples of how the quality of their representations can be assessed. We highlight the advantages and potential of these methods, especially for the study of semantic change, together with their limitations. Les m & eacute;thodes computationnelles ont produit des r & eacute;sultats significatifs et utilisables pour & eacute;tudier la s & eacute;mantique des mots, y compris le changement s & eacute;mantique. Ces m & eacute;thodes, qui appartiennent au domaine du traitement automatique des langues, ont & eacute;t & eacute; appliqu & eacute;es r & eacute;cemment aux langues anciennes. Notamment, la mod & eacute;lisation du langage a & eacute;t & eacute; appliqu & eacute;e au grec ancien, la langue sur laquelle nous nous concentrons. Dans cette contribution on explique comment des vecteurs de mots peuvent & ecirc;tre calcul & eacute;s & agrave; partir de cooccurrences dans un corpus et comment ils peuvent & ecirc;tre utilis & eacute;s pour localiser les mots dans un espace s & eacute;mantique. On explique aussi quel type d'information peut & ecirc;tre extrait des mod & egrave;les de langage. On compare trois diff & eacute;rents types de mod & egrave;les de langue qui peuvent & ecirc;tre utilis & eacute;s pour & eacute;tudier la s & eacute;mantique du grec ancien: un mod & egrave;le & agrave; d & eacute;comptage (count-based), un mod & egrave;le Word2vec (qui produit des plongements de mots, 'word embeddings') et des plongements de mots enrichis d'information syntactique. On pr & eacute;sente des exemples montrant comment la qualit & eacute; de ces repr & eacute;sentations de mots peut & ecirc;tre & eacute;valu & eacute;e. On met en & eacute;vidence les avantages et les potentialit & eacute;s de ces m & eacute;thodes, notamment pour & eacute;tudier le changement s & eacute;mantique, ainsi que leurs limites. Es hat sich gezeigt, dass rechnerische Methoden aussagekr & auml;ftige und nutzbare Ergebnisse f & uuml;r die Untersuchung der Wortsemantik, einschlie ss lich semantischer Ver & auml;nderungen, liefern. Diese Methoden, die zum Bereich des Natural Language Processing geh & ouml;ren, werden seit kurzem auf alte Sprachen angewendet, Insbesondere auf das Altgriechische, das hier von Interesse ist. In diesem Beitrag erkl & auml;ren wir, wie Vektordarstellungen aus Wort-Kookkurrenzen in einem Korpus berechnet und zur Lokalisierung von W & ouml;rtern in einem semantischen Raum verwendet werden k & ouml;nnen und welche Art semantischer Informationen aus Sprachmodellen extrahiert werden k & ouml;nnen. Wir vergleichen drei verschiedene Arten von Sprachmodellen, die zur Untersuchung altgriechischer Semantik angewendet werden k & ouml;nnen: ein z & auml;hlbasiertes Modell, ein Worteinbettungsmodell und ein syntaktisches Einbettungsmodell, und wir zeigen an Beispielen, wie die Qualit & auml;t ihrer Darstellungen bewertet werden kann. Wir zeigen die Vorteile und das Potenzial dieser Methoden auf, insbesondere f & uuml;r die Untersuchung semantischer Ver & auml;nderungen, beleuchten aber auch deren Grenzen.

引用

页码：414 / 435

页数：22

共 50 条

[1] A Natural Language Processing Survey on Legislative and Greek Documents
Krasadakis, Panteleimon
Sakkopoulos, Evangelos
Verykios, Vassilios S.
25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 407 - 412
[2] Antroponymy of Ancient Greek Language
Letova, I. A.
VOPROSY ONOMASTIKI-PROBLEMS OF ONOMASTICS, 2009, (01): : 117 - 122
[3] Ancient Greek and medical language
Herrera-Aliaga, Eduardo
Cortes, Manuel E.
Cortes-Duran, Pedro Pablo
REVISTA MEDICA DE CHILE, 2023, 151 (04) : 533 - 534
[4] Is ancient Greek a dead language?
Gavras, I
LANCET, 2001, 358 (9279): : 424 - 424
[5] A Greek morphological lexicon and its exploitation by natural language processing applications
Petasis, G
Karkaletsis, V
Farmakiotou, D
Androutsopoulos, I
Spyropoulo, CD
ADVANCES IN INFORMATICS, 2003, 2563 : 401 - 419
[6] Segment demarcation in ancient greek language
Him Fabrega, Rodrigo
REVISTA KANINA, 2014, 38 (02): : 175 - 188
[7] Encyclopedia of Ancient Greek Language and Linguistics
Vatri, Alessandro
JOURNAL OF HELLENIC STUDIES, 2015, 135 : 307 - 308
[8] Speaking Ancient Greek as a Living Language
Nodet, Etienne
REVUE BIBLIQUE, 2017, 124 (04) : 624 - 625
[9] Language and History in Ancient Greek Culture
Banchich, Thomas M.
POLIS, 2010, 27 (02): : 343 - 346
[10] A Companion to the Ancient Greek Language.
Janda, Michael
GYMNASIUM, 2013, 120 (05) : 505 - 507

← 1 2 3 4 5 →