Text corpus with errors

被引:0
|
作者
Pala, K [1 ]
Rychly, P [1 ]
Smrz, P [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno 60200, Czech Republic
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.
引用
收藏
页码:90 / 97
页数:8
相关论文
共 50 条
  • [1] Errors in text
    Schubert, David
    JOURNAL OF THE ROYAL SOCIETY OF MEDICINE, 2008, 101 (09) : 435 - 435
  • [2] Classification of Errors in Text
    Busta, Jan
    Hlavackova, Dana
    Jakubicek, Milos
    Pala, Karel
    RASLAN 2009: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2009, : 109 - 119
  • [3] Text indexing with errors
    Maass, MG
    Nowak, J
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2005, 3537 : 21 - 32
  • [4] Text indexing with errors
    Maass, Moritz G.
    Nowak, Johannes
    JOURNAL OF DISCRETE ALGORITHMS, 2007, 5 (04) : 662 - 681
  • [5] A corpus of Persian literary text
    Raji, Shahab
    Alikhani, Malihe
    de Melo, Gerard
    Stone, Matthew
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) : 409 - 425
  • [6] Crowdsourcing a Text Corpus is not a Game
    Packham, Sean
    Suleman, Hussein
    DIGITAL LIBRARIES: PROVIDING QUALITY INFORMATION, 2015, 9469 : 225 - 234
  • [7] Turkish Labeled Text Corpus
    Ozturk, Secil
    Sankur, Bulent
    Gungor, Tunga
    Yilmaz, Mustafa Berkay
    Koroglu, Bilge
    Agin, Onur
    Isbilen, Mustafa
    Ulas, Cagdas
    Ahat, Mehmet
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1395 - 1398
  • [8] Development of Sindhi text corpus
    Dootio, Mazhar Ali
    Wagan, Asim Imdad
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (04) : 468 - 475
  • [9] Word, text, discourse, corpus
    Szczyglowska, Tatiana
    Palion-Musiol, Agnieszka
    SWIAT I SLOWO, 2024, 43 (02): : 9 - 15
  • [10] Visualization of Text Document Corpus
    Fortuna, Blaz
    Grobelnik, Marko
    Mladenic, Dunja
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2005, 29 (04): : 497 - 502