Rethinking The Corpus: Moving towards Dynamic Linguistic Resources

被引:0
|
作者
Rosenberg, Andrew [1 ]
机构
[1] CUNY Queens Coll, Dept Comp Sci, Flushing, NY 11367 USA
关键词
Linguistic Resources; Opinion paper;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The corpus is an invaluable resource in Spoken and Natural Language Processing. Consistent data sets have allowed for empirical evaluation of competing algorithms. The sharing of high-quality annotated linguistic data has enabled participation and experimentation by a wide range of researchers. However, despite dubbing these annotations as "gold-standard", many corpora contain labeling errors and idiosyncrasies. The current view of the corpus as a static resource makes correction of errors and other modifications prohibitively difficult. In this paper, a perspective of the corpus as dynamically changing is advanced. We highlight the problems of the static view of the corpus through case studies of the Penn Treebank, Switchboard, Hub-4 and Boston University Radio News Corpus. We propose the use of version control software as a mechanism to facilitate this dynamic view.
引用
收藏
页码:1390 / 1393
页数:4
相关论文
共 50 条
  • [21] Deep learning of pharmacogenomics resources: moving towards precision oncology
    Chiu, Yu-Chiao
    Chen, Hung-I Harry
    Gorthi, Aparna
    Mostavi, Milad
    Zheng, Siyuan
    Huang, Yufei
    Chen, Yidong
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (06) : 2066 - 2083
  • [22] Rethinking linguistic relativity.
    Douaud, PC
    ANTHROPOS, 1999, 94 (1-3) : 272 - 274
  • [23] Rethinking linguistic relativity.
    Vandeloise, C
    Levinson, SC
    CANADIAN JOURNAL OF LINGUISTICS-REVUE CANADIENNE DE LINGUISTIQUE, 1997, 42 (04): : 495 - 499
  • [24] Rethinking linguistic relativity.
    Brody, J
    LANGUAGE, 1998, 74 (03) : 638 - 640
  • [25] Rethinking linguistic relativity.
    Polome, EC
    JOURNAL OF INDO-EUROPEAN STUDIES, 1998, 26 (1-2): : 276 - 277
  • [26] Rethinking linguistic relativity.
    Hill, JH
    LANGUAGE IN SOCIETY, 1999, 28 (03) : 439 - 443
  • [27] Rethinking linguistic relativity.
    Meek, BA
    AMERICAN ANTHROPOLOGIST, 1998, 100 (02) : 583 - 583
  • [28] Linguistic corpus and language teaching
    Almau, Sonia Almau
    Serrano, Maribel
    RILCE-REVISTA DE FILOLOGIA HISPANICA, 2025, 41 (01): : 455 - 457
  • [29] 'Corpus': Methodology and linguistic applications
    Ploog, K
    REVUE ROMANE, 2002, 37 (01) : 148 - 151
  • [30] Corpus: Methodology and linguistic applications
    Quillard, V
    FRANCAIS MODERNE, 2004, 72 (01): : 116 - 118