Rethinking The Corpus: Moving towards Dynamic Linguistic Resources

被引:0
|
作者
Rosenberg, Andrew [1 ]
机构
[1] CUNY Queens Coll, Dept Comp Sci, Flushing, NY 11367 USA
关键词
Linguistic Resources; Opinion paper;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The corpus is an invaluable resource in Spoken and Natural Language Processing. Consistent data sets have allowed for empirical evaluation of competing algorithms. The sharing of high-quality annotated linguistic data has enabled participation and experimentation by a wide range of researchers. However, despite dubbing these annotations as "gold-standard", many corpora contain labeling errors and idiosyncrasies. The current view of the corpus as a static resource makes correction of errors and other modifications prohibitively difficult. In this paper, a perspective of the corpus as dynamically changing is advanced. We highlight the problems of the static view of the corpus through case studies of the Penn Treebank, Switchboard, Hub-4 and Boston University Radio News Corpus. We propose the use of version control software as a mechanism to facilitate this dynamic view.
引用
收藏
页码:1390 / 1393
页数:4
相关论文
共 50 条
  • [1] RLD corpus: The corpus of linguistic resources in Spanish jurisprudence
    Alonso-Cortes Manteca, Angel
    Diaz Ayuga, Juan Manuel
    Fernandez-Pampillon Cesteros, Ana Maria
    REVISTA ESPANOLA DE LINGUISTICA APLICADA, 2022, 35 (02): : 425 - 448
  • [2] Towards a Linguistic Corpus in Spanish with Personality Annotations
    Hernandez, Yasmin
    Acevedo Pena, Carlos
    Martinez, Alicia
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2017, PT II, 2018, 10633 : 158 - 168
  • [3] Towards Crowdsourcing and Cooperation in Linguistic Resources
    Ustalov, Dmitry
    INFORMATION RETRIEVAL, RUSSIR 2014, 2015, 505 : 348 - 358
  • [4] TOWARDS A LINGUISTIC CORPUS OF THE HUMOROUS MORETIAN IN THE PALATINE COMEDIES
    Martinez Carro, Elena
    ANUARIO CALDERONIANO, 2021, (14) : 245 - 270
  • [6] VOLIP: a corpus of spoken Italian and a virtuous example of reuse of linguistic resources
    Alfano, Iolanda
    Cutugno, Francesco
    De Rosa, Aurelio
    Iacobini, Claudio
    Savy, Renata
    Voghera, Miriam
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3897 - 3901
  • [7] Using corpus linguistic software in the extraction of news frames: towards a dynamic process of frame analysis in journalistic texts
    Touri, Maria
    Koteyko, Nelya
    INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2015, 18 (06) : 601 - 616
  • [8] Towards the linguistic approach to ideasthesia (case study of the multilingual parallel corpus)
    Iaroshenko, Polina, V
    VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA-YAZYK I LITERATURA, 2023, 20 (01): : 156 - 169
  • [9] Moving towards Effective Governance of Fisheries and Freshwater Resources
    Bartley, Devin M.
    Leonard, Nancy J.
    Youn, So-Jung
    Taylor, William W.
    Baigun, Claudio
    Barlow, Chris
    Fazio, John
    Fuentevilla, Carlos
    Johnson, Jay
    Kone, Bakary
    Meira, Kristin
    Metzner, Rebecca
    Onyango, Paul
    Pavlov, Dmitry
    Riley, Betsy
    Ruff, Jim
    Terbasket, Pauline
    Valbo-Jorgensen, John
    FRESHWATER, FISH AND THE FUTURE: PROCEEDINGS OF THE GLOBAL CROSS-SECTORAL CONFERENCE, 2016, : 251 - 279
  • [10] Rethinking Corpus Christi
    Rodman, Rosamond C.
    NAMES-A JOURNAL OF ONOMASTICS, 2018, 66 (03): : 166 - 175