Efficient corpus development for lexicography: building the New Corpus for Ireland

被引：10

作者：

Kilgarriff, Adam ^{[1
]}

Rundell, Michael

Dhonnchadha, Elaine Ui

机构：

[1] Lexicog MasterClass Ltd, Brighton, E Sussex, England

[2] Trinity Coll Dublin, Dublin, Ireland

来源：

LANGUAGE RESOURCES AND EVALUATION | 2006年 / 40卷 / 02期

关键词：

corpus linguistics; lexicography; computational linguistics; natural language processing; dictionaries; Irish; Gaelic; Hiberno-English; language technology;

D O I：

10.1007/s10579-006-9011-7

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In a 12-month project we have developed a new, register-diverse, 55-million-word bilingual corpus-the New Corpus for Ireland (NCI)-to support the creation of a new English-to-Irish dictionary. The paper describes the strategies we employed, and the solutions to problems encountered. We believe we have a good model for corpus creation for lexicography, and others may find it useful as a blueprint. The corpus has two parts, one Irish, the other Hiberno-English (English as spoken in Ireland). We describe its design, collection and encoding.

引用

页码：127 / 152

页数：26

共 50 条

[31] About Grammar in Dictionaries. Corpus Analysis in Bilingual Lexicography
Taborek, Janusz
ACADEMIC JOURNAL OF MODERN PHILOLOGY, 2012, 1 : 129 - 137
[32] Building an oral corpus of questions
Reinhardt, Janina
5E CONGRES MONDIAL DE LINGUISTIQUE FRANCAISE, 2016, 27
[33] A development of a speech data transcription tool for building a spoken corpus
You, Yeonguk
Noh, Hyangrae
Park, Jaeeun
Kim, Yunsoo
KwaK, Yongjn
Kim, Yoonjoong
2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1437 - 1439
[34] Building a Web Corpus of Czech
Spoustova, Drahomira Johanka
Spousta, Miroslav
Pecina, Pavel
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
[35] On Building a Reusable Twitter Corpus
McCreadie, Richard
Soboroff, Ian
Lin, Jimmy
Macdonald, Craig
Ounis, Iadh
McCullough, Dean
SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 1113 - 1114
[36] Research on the Lexicography Based on the Corpus of International Chinese Teaching Materials
Zhang, Yinbing
Song, Jihua
Peng, Weiming
Guo, Dongdong
Song, Tianbao
CHINESE LEXICAL SEMANTICS, CLSW 2017, 2018, 10709 : 468 - 478
[37] Development of Focused Crawlers for Building Large Punjabi News Corpus
Mahi, Gurjot Singh
Verma, Amandeep
JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2021, 15 (03) : 205 - 215
[38] Using the Web as an Efficient Source of Building an Arabic Corpus: Presentation and Evaluation
Bakari, Wided
Bellot, Patrice
Neji, Mahmoud
INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE VISION 2020: FROM REGIONAL DEVELOPMENT SUSTAINABILITY TO GLOBAL ECONOMIC GROWTH, VOLS I - VI, 2016, : 3399 - 3412
[39] An efficient tool for building a large part-of-speech annotated corpus
Lim, HS
Rim, HC
IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1225 - 1229
[40] A new approach in building a corpus for natural language generation systems
Galindo, MDB
de Cea, GA
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2001, 2004 : 216 - 225

← 1 2 3 4 5 →