Efficient corpus development for lexicography: building the New Corpus for Ireland

被引:10
|
作者
Kilgarriff, Adam [1 ]
Rundell, Michael
Dhonnchadha, Elaine Ui
机构
[1] Lexicog MasterClass Ltd, Brighton, E Sussex, England
[2] Trinity Coll Dublin, Dublin, Ireland
关键词
corpus linguistics; lexicography; computational linguistics; natural language processing; dictionaries; Irish; Gaelic; Hiberno-English; language technology;
D O I
10.1007/s10579-006-9011-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus-the New Corpus for Ireland (NCI)-to support the creation of a new English-to-Irish dictionary. The paper describes the strategies we employed, and the solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may find it useful as a blueprint. The corpus has two parts, one Irish, the other Hiberno-English (English as spoken in Ireland). We describe its design, collection and encoding.
引用
收藏
页码:127 / 152
页数:26
相关论文
共 50 条
  • [31] About Grammar in Dictionaries. Corpus Analysis in Bilingual Lexicography
    Taborek, Janusz
    ACADEMIC JOURNAL OF MODERN PHILOLOGY, 2012, 1 : 129 - 137
  • [32] Building an oral corpus of questions
    Reinhardt, Janina
    5E CONGRES MONDIAL DE LINGUISTIQUE FRANCAISE, 2016, 27
  • [33] A development of a speech data transcription tool for building a spoken corpus
    You, Yeonguk
    Noh, Hyangrae
    Park, Jaeeun
    Kim, Yunsoo
    KwaK, Yongjn
    Kim, Yoonjoong
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1437 - 1439
  • [34] Building a Web Corpus of Czech
    Spoustova, Drahomira Johanka
    Spousta, Miroslav
    Pecina, Pavel
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [35] On Building a Reusable Twitter Corpus
    McCreadie, Richard
    Soboroff, Ian
    Lin, Jimmy
    Macdonald, Craig
    Ounis, Iadh
    McCullough, Dean
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1113 - 1114
  • [36] Research on the Lexicography Based on the Corpus of International Chinese Teaching Materials
    Zhang, Yinbing
    Song, Jihua
    Peng, Weiming
    Guo, Dongdong
    Song, Tianbao
    CHINESE LEXICAL SEMANTICS, CLSW 2017, 2018, 10709 : 468 - 478
  • [37] Development of Focused Crawlers for Building Large Punjabi News Corpus
    Mahi, Gurjot Singh
    Verma, Amandeep
    JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2021, 15 (03) : 205 - 215
  • [38] Using the Web as an Efficient Source of Building an Arabic Corpus: Presentation and Evaluation
    Bakari, Wided
    Bellot, Patrice
    Neji, Mahmoud
    INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE VISION 2020: FROM REGIONAL DEVELOPMENT SUSTAINABILITY TO GLOBAL ECONOMIC GROWTH, VOLS I - VI, 2016, : 3399 - 3412
  • [39] An efficient tool for building a large part-of-speech annotated corpus
    Lim, HS
    Rim, HC
    IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1225 - 1229
  • [40] A new approach in building a corpus for natural language generation systems
    Galindo, MDB
    de Cea, GA
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2001, 2004 : 216 - 225