Rethinking The Corpus: Moving towards Dynamic Linguistic Resources

被引:0
|
作者
Rosenberg, Andrew [1 ]
机构
[1] CUNY Queens Coll, Dept Comp Sci, Flushing, NY 11367 USA
关键词
Linguistic Resources; Opinion paper;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The corpus is an invaluable resource in Spoken and Natural Language Processing. Consistent data sets have allowed for empirical evaluation of competing algorithms. The sharing of high-quality annotated linguistic data has enabled participation and experimentation by a wide range of researchers. However, despite dubbing these annotations as "gold-standard", many corpora contain labeling errors and idiosyncrasies. The current view of the corpus as a static resource makes correction of errors and other modifications prohibitively difficult. In this paper, a perspective of the corpus as dynamically changing is advanced. We highlight the problems of the static view of the corpus through case studies of the Penn Treebank, Switchboard, Hub-4 and Boston University Radio News Corpus. We propose the use of version control software as a mechanism to facilitate this dynamic view.
引用
收藏
页码:1390 / 1393
页数:4
相关论文
共 50 条
  • [41] Moving towards deep underground mineral resources: Drivers, challenges and potential solutions
    Ghorbani, Yousef
    Nwaila, Glen T.
    Zhang, Steven E.
    Bourdeau, Julie E.
    Canovas, Manuel
    Arzua, Javier
    Nikadat, Nooraddin
    RESOURCES POLICY, 2023, 80
  • [42] Moving towards a sustainable environment: The dynamic linkage between natural resources, human capital, urbanization, economic growth, and ecological footprint in China
    Ahmed, Zahoor
    Asghar, Muhammad Mansoor
    Malik, Muhammad Nasir
    Nawaz, Kishwar
    RESOURCES POLICY, 2020, 67
  • [43] Moving from "Reform" to "Rethinking"
    Hess, Frederick M.
    EDUCATIONAL LEADERSHIP, 2023, 80 (06) : 40 - 44
  • [44] Rethinking the Cartesian theory of linguistic productivity
    Brattico, Pauli
    Liikkanen, Lassi
    PHILOSOPHICAL PSYCHOLOGY, 2009, 22 (03) : 251 - 279
  • [45] JCoLA: Japanese Corpus of Linguistic Acceptability
    Someya, Taiga
    Sugimoto, Yushi
    Oseki, Yohei
    2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 2024, : 9477 - 9488
  • [46] Representative corpus of grammars and linguistic traditions
    Paissa, P
    STUDI FRANCESI, 1999, 43 (03) : 689 - 690
  • [47] Accepted is not assumed - a corpus linguistic Analysis
    Volodina, Anna
    DEUTSCHE SPRACHE, 2018, 46 (01): : 67 - 81
  • [48] Crosslinguistic Corpus Studies in Linguistic Typology
    Schnell, Stefan
    Schiborr, Nils Norman
    ANNUAL REVIEW OF LINGUISTICS, 2022, 8 : 171 - 191
  • [49] Intangible resources and competitiveness: Towards a dynamic view of corporate performance
    Bounfour, A
    COMPETITIVENESS AND THE VALUE OF INTANGIBLE ASSETS, 2000, : 17 - 41
  • [50] The World Wide Web as linguistic corpus
    Meyer, CF
    Grabowski, R
    Han, HY
    Mantzouranis, K
    Moses, S
    CORPUS ANALYSIS: LANGUAGE STRUCTURE AND LANGUAGE USE, 2003, (46): : 241 - 254