Rethinking The Corpus: Moving towards Dynamic Linguistic Resources

被引:0
|
作者
Rosenberg, Andrew [1 ]
机构
[1] CUNY Queens Coll, Dept Comp Sci, Flushing, NY 11367 USA
关键词
Linguistic Resources; Opinion paper;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The corpus is an invaluable resource in Spoken and Natural Language Processing. Consistent data sets have allowed for empirical evaluation of competing algorithms. The sharing of high-quality annotated linguistic data has enabled participation and experimentation by a wide range of researchers. However, despite dubbing these annotations as "gold-standard", many corpora contain labeling errors and idiosyncrasies. The current view of the corpus as a static resource makes correction of errors and other modifications prohibitively difficult. In this paper, a perspective of the corpus as dynamically changing is advanced. We highlight the problems of the static view of the corpus through case studies of the Penn Treebank, Switchboard, Hub-4 and Boston University Radio News Corpus. We propose the use of version control software as a mechanism to facilitate this dynamic view.
引用
收藏
页码:1390 / 1393
页数:4
相关论文
共 50 条
  • [31] Rethinking resources and hybridity
    Gonsalves A.J.
    Seiler G.
    Salter D.E.
    Cultural Studies of Science Education, 2011, 6 (2) : 389 - 399
  • [32] Linguistic Resources Construction: Towards Disfluency Processing in Spontaneous Tunisian Dialect Speech
    Boughariou, Emna
    Bahou, Younes
    Bleguith, Lamia Hadrich
    TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 316 - 328
  • [33] TOWARDS THE INTEGRATION OF LINGUISTIC AND NON-LINGUISTIC SPATIAL COGNITION: A DYNAMIC FIELD THEORY APPROACH
    Lipinski, John
    Spencer, John P.
    Samuelson, Larissa K.
    CONNECTIONIST MODELS OF BEHAVIOUR AND COGNITION II, 2009, 18 : 205 - +
  • [34] Towards self-tuning of dynamic resources for workloads
    Duan, Fu
    Han, Yongjie
    Zhao, Qiuyong
    Me, Keming
    FIRST INTERNATIONAL WORKSHOP ON KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, : 297 - 300
  • [35] THE CORPUS/STATUS DIALECTICS IN THE HISTORY OF THE CATALAN LANGUAGE. TOWARDS A COMPREHENSIVE EXPLANATION OF LINGUISTIC CHANGES
    Messalles, Mar Massanell, I
    CAPLLETRA, 2021, (71): : 219 - 236
  • [36] RETHINKING PETER LOMBARD'S CORPUS
    Clark, Mark J.
    RECHERCHES DE THEOLOGIE ET PHILOSOPHIE MEDIEVALES, 2023, 90 (02):
  • [37] Linguistic Corpus and Representativeness: The Usefulness of Data in Child Language Corpus
    Fernandez-Perez, Milagros
    RILCE-REVISTA DE FILOLOGIA HISPANICA, 2020, 36 (02): : 651 - 673
  • [38] PLANEO CORPUS: METHODOLOGY AND RESULTS OF A CORPUS OF ANDALUSIAN LINGUISTIC LANDSCAPE
    Rodriguez, Lola Pons
    PHILOLOGIA HISPALENSIS, 2024, 38 (01):
  • [39] Towards Dynamic Visual Servoing for Interaction Control and Moving Targets
    Oliva, Alexander Antonio
    Aertbelien, Erwin
    De Schutter, Joris
    Giordano, Paolo Robuffo
    Chaumette, Francois
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [40] Rethinking resuscitation: moving the goals
    Psirides, Alex
    Tripp, David G.
    Pegg, Tammy J.
    NEW ZEALAND MEDICAL JOURNAL, 2021, 134 (1540) : 83 - 88