Efficient corpus development for lexicography: building the New Corpus for Ireland

被引:10
|
作者
Kilgarriff, Adam [1 ]
Rundell, Michael
Dhonnchadha, Elaine Ui
机构
[1] Lexicog MasterClass Ltd, Brighton, E Sussex, England
[2] Trinity Coll Dublin, Dublin, Ireland
关键词
corpus linguistics; lexicography; computational linguistics; natural language processing; dictionaries; Irish; Gaelic; Hiberno-English; language technology;
D O I
10.1007/s10579-006-9011-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus-the New Corpus for Ireland (NCI)-to support the creation of a new English-to-Irish dictionary. The paper describes the strategies we employed, and the solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may find it useful as a blueprint. The corpus has two parts, one Irish, the other Hiberno-English (English as spoken in Ireland). We describe its design, collection and encoding.
引用
收藏
页码:127 / 152
页数:26
相关论文
共 50 条
  • [41] A New Approach for Building Domain-specific Corpus with Wikipedia
    Zhang, Xinye
    Li, Xiu
    Ruan, Zhijian
    MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 2319 - 2325
  • [42] Corpus development 25 years on: from super-corpus to cyber-corpus
    Renouf, Antoinette
    CORPUS LINGUISTICS 25 YEARS ON, 2007, (62): : 27 - 49
  • [43] Building a Corpus of 2L English for Automatic Assessment: the CLEC Corpus
    Zarco Tejada, Ma Angeles
    Noya Gallardo, Carmen
    Merino Ferrada, Ma Carmen
    Calderon Lopez, Ma Isabel
    CURRENT WORK IN CORPUS LINGUISTICS: WORKING WITH TRADITIONALLY- CONCEIVED CORPORA AND BEYOND (CILC2015), 2015, 198 : 515 - 525
  • [44] Research Report: Building a Wide Reach Corpus for Secure Parser Development
    Allison, Tim
    Burke, Wayne
    Constantinou, Valentino
    Goh, Edwin
    Mattmann, Chris
    Mensikova, Anastasija
    Southam, Philip
    Stonebraker, Ryan
    Timmaraju, Virisha
    2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2020), 2020, : 318 - 326
  • [45] The use of corpus and Frame Semantics in a lexicography class: Evaluating dictionary entries
    Zainudin, Intan Safinaz
    Jalaluddin, Nor Hashimah
    Abu Bakar, Khairul Taufiq
    5TH WORLD CONFERENCE ON EDUCATIONAL SCIENCES, 2014, 116 : 2316 - 2320
  • [46] Disambiguating Verbs by Collocation: Corpus Lexicography meets Natural Language Processing
    El Maarouf, Ismail
    Baisa, Vit
    Bradbury, Jane
    Hanks, Patrick
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1001 - 1006
  • [47] Computational approach for processing of control engineer text: Applications for corpus lexicography
    Vitayapirak, Jirapa
    Ratiroch-anant, Phornsuk
    2006 IEEE Conference on Cybernetics and Intelligent Systems, Vols 1 and 2, 2006, : 239 - 244
  • [48] A Corpus-based Approach to Lexicography: Towards a Thesaurus of English Idioms
    Gizatova, Guzel
    PROCEEDINGS OF THE XVII EURALEX INTERNATIONAL CONGRESS: LEXICOGRAPHY AND LINGUISTIC DIVERSITY, 2016, : 348 - 354
  • [49] Building the British Sign Language Corpus
    Schembri, Adam
    Fenlon, Jordan
    Rentelis, Ramas
    Reynolds, Sally
    Cormier, Kearsy
    LANGUAGE DOCUMENTATION & CONSERVATION, 2013, 7 : 136 - 154
  • [50] Building Corpus with Emoticons for Sentiment Analysis
    Li, Changliang
    Wang, Yongguan
    Li, Changsong
    Qi, Ji
    Liu, Pengyuan
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 309 - 318