Text corpus with errors

被引:0
|
作者
Pala, K [1 ]
Rychly, P [1 ]
Smrz, P [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno 60200, Czech Republic
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.
引用
收藏
页码:90 / 97
页数:8
相关论文
共 50 条