Attributive Collocations in the Gold Standard of Russian Collocability and Their Representation in Dictionaries and Corpora

被引:2
|
作者
Khokhlova, Maria, V [1 ]
机构
[1] St Petersburg State Univ, St Petersburg, Russia
基金
俄罗斯科学基金会;
关键词
collocations; collocability; attributive collocations; Russian language; dictionaries; text corpora; database;
D O I
10.17223/22274200/21/2
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
The article discusses how collocations are represented in Russian dictionaries and how information about them can be covered in a collocation database that is being developed. Such a resource (gold standard) can be in demand when developing applications for teaching or learning Russian as a foreign language and solving other theoretical and applied issues. The aim of the study was twofold: firstly, to analyze how explanatory and specialized dictionaries of the Russian language represent collocations and hence to what extent their data coincide with each other, and, secondly, to investigate how these dictionary collocations are reflected in text corpora. This allows tracing the relation between manually collected data and modern corpora. For the study, the author used the disambiguated subcorpus and the main corpus of the Russian National Corpus (RNC) with a volume of 6 million and 321 million words, respectively, as well as the large Internet corpus ruTenTen with a volume of more than 14.5 billion words. The author considered attributive phrases built according to the "adjective/participle + noun" model. She analyzed 120 collocations with different dictionary index, i.e. the number of dictionaries in which this phrase is given. The following hypothesis was tested: high collocation frequencies correspond to the fact that the item is recorded in several dictionaries. In the analysis, nonparametric analogues of analysis of variance (Friedman and Kruskal-Wallis tests) were used to assess the statistical significance of differences in quantitative data. The frequencies of collocations in corpora of different volume and in different dictionaries were compared. In total, more than 15 thousand examples were processed, less than 0.5% of them were presented in four of the six reviewed dictionaries (five printed and one electronic). The results show data heterogeneity, items selected for a dictionary do not coincide with their frequency characteristics and thus word combinations turn out to be low-frequency. About 34% of the examples are absent in the RNC corpus with removed ambiguity, and about 12% of analyzed collocations are rare (less than 0.01 ipm) even in the ruTenTen corpus. The presence of collocations in several dictionaries indicates their higher frequencies and hence reproducibility in speech. Explanatory dictionaries and collocation dictionaries show the smallest intersection of data. The results show that the amount of data is a crucial issue, and the very phenomenon of collocability should be studied on large corpora.
引用
收藏
页码:33 / 68
页数:36
相关论文
共 17 条
  • [1] Interpreting collocations: An analysis of collocation dictionaries of: English, German and Russian
    Garibyan, Armine
    LEXICOGRAPHICA, 2019, 35 : 191 - 216
  • [2] All That Glitters is Not Gold: A Gold Standard of Adjective-Noun Collocations for German
    Strakatova, Yana
    Falk, Neele
    Fuhrmann, Isabel
    Hinrichs, Erhard
    Rossmann, Daniela
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4368 - 4378
  • [3] PROBLEMS ASSOCIATED WITH THE REPRESENTATION OF NORMS IN NEW ORTHOEPIC RUSSIAN DICTIONARIES
    HAUSLER, F
    ZEITSCHRIFT FUR SLAWISTIK, 1988, 33 (02): : 193 - 199
  • [4] Semantics of obshchestvennyy and sotsial'nyy: Representation in Russian explanatory dictionaries
    Morgunova, Olga, V
    TOMSK STATE UNIVERSITY JOURNAL, 2022, (480): : 23 - 33
  • [5] PRAGMATIC CHARACTERISTICS OF SPEECH AND THEIR LEXICOGRAPHIC REPRESENTATION IN IDEOGRAPHIC DICTIONARIES OF THE RUSSIAN LANGUAGE
    Voronina, Tatyana M.
    VOPROSY LEKSIKOGRAFII-RUSSIAN JOURNAL OF LEXICOGRAPHY, 2018, 14 : 122 - 140
  • [6] RUSSIAN GOLD STANDARD, 1897-1914
    DRUMMOND, IM
    JOURNAL OF ECONOMIC HISTORY, 1976, 36 (03): : 663 - 688
  • [7] The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs
    Gaillat, Thomas
    Zarrouk, Manel
    Freitas, Andre
    Davis, Brian
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2671 - 2675
  • [8] Presentation of the new ISO-Standard for the representation of entries in dictionaries: ISO 1951
    Derouin, Marie-Jeanne
    Le Meur, Andre
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 754 - 757
  • [9] Automatic TimeML Corpus Validation: Uncovering Errors and Inconsistencies in Gold-Standard Corpora
    Ocal, Mustafa
    17TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2024, 2024, : 382 - 387
  • [10] Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources
    Dietrich Rebholz-Schuhmann
    Senay Kafkas
    Jee-Hyub Kim
    Chen Li
    Antonio Jimeno Yepes
    Robert Hoehndorf
    Rolf Backofen
    Ian Lewin
    Journal of Biomedical Semantics, 4