The making of the Litkey Corpus, a richly annotated longitudinal corpus of German texts written by primary school children

被引:0
|
作者
Laarmann-Quante, Ronja [1 ]
Dipper, Stefanie [1 ]
Belke, Eva [1 ]
机构
[1] Ruhr Univ Bochum, Fak Philol, Dept Linguist, Bochum, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To date, corpus and computational linguistic work on written language acquisition has mostly dealt with second language learners who have usually already mastered orthography acquisition in their first language. In this paper, we present the Litkey Corpus, a richly-annotated longitudinal corpus of written texts produced by primary school children in Germany from grades 2 to 4. The paper focuses on the (semi-)automatic annotation procedure at various linguistic levels, which include POS tags, features of the word-internal structure (phonemes, syllables, morphemes) and key orthographic features of the target words as well as a categorization of spelling errors. Comprehensive evaluations show that high accuracy was achieved on all levels, making the Litkey Corpus a useful resource for corpus-based research on literacy acquisition of German primary school children and for developing NLP tools for educational purposes. The corpus is freely available under https://www.linguistics.rub.de/litkeycorpus/.
引用
收藏
页码:43 / 55
页数:13
相关论文
共 12 条
  • [1] The Litkey Corpus: A richly annotated longitudinal corpus of German texts written by primary school children
    Laarmann-Quante, Ronja
    Ortmann, Katrin
    Ehlert, Anna
    Masloch, Simon
    Scholz, Doreen
    Belke, Eva
    Dipper, Stefanie
    BEHAVIOR RESEARCH METHODS, 2019, 51 (04) : 1889 - 1918
  • [2] The Litkey Corpus: A richly annotated longitudinal corpus of German texts written by primary school children
    Ronja Laarmann-Quante
    Katrin Ortmann
    Anna Ehlert
    Simon Masloch
    Doreen Scholz
    Eva Belke
    Stefanie Dipper
    Behavior Research Methods, 2019, 51 : 1889 - 1918
  • [3] BasiScript A corpus of contemporary Dutch texts written by primary school children
    Tellings, Agnes
    Oostdijk, Nelleke
    Monster, Iris
    Grootjen, Franc
    van den Bosch, Antal
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2018, 23 (04) : 494 - 508
  • [4] Voices of the Great War: A Richly Annotated Corpus of Italian Texts on the First World War
    Lenci, Alessandro
    Montemagni, Simonetta
    Boschetti, Federico
    De Felice, Irene
    dei Rossi, Stefano
    Dell'Orletta, Felice
    Di Giorgio, Michele
    Miliani, Martina
    Passaro, Lucia C.
    Puddu, Angelica
    Venturi, Giulia
    Labanca, Nicola
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 911 - 918
  • [5] Corpus CesCa Compiling a corpus of written Catalan produced by school children
    Llaurado, Anna
    Marti, Antonia
    Tolchinsky, Liliana
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2012, 17 (03) : 428 - 441
  • [6] Building and Using a Richly Annotated Interlinear Diachronic Corpus: The Case of Old High German Tatian
    Petrova, Svetlana
    Solf, Michael
    Ritz, Julia
    Chiarcos, Christian
    Zeldes, Amir
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (02): : 47 - 71
  • [7] Guidelines for a Brazilian Portuguese Corpus Composed by Texts Written for Children
    Brangel, Larissa Moreira
    Sartori, Beatriz Nogueira
    da Camara, Margot Luiza Pedron
    CALIGRAMA-REVISTA DE ESTUDOS ROMANICOS, 2024, 29 (01): : 24 - 42
  • [8] Designing an Annotated Longitudinal Latvian Children's Speech Corpus
    Auzina, Ilze
    Levane-Petrova, Kristine
    Rabante-Busa, Guna
    Dargis, Roberts
    Fabregas, Antonio
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 46 - 50
  • [9] Annotation of coherence in a corpus of texts from primary and middle school
    Bras, Myriam
    Vieu, Laure
    E-CALM COLLOQUIUM: ANALYSING LARGE SCHOOL AND UNIVERSITY CORPORA: QUESTIONS FOR RESEARCH AND TRAINING, E-CALM 2022, 2024, 186
  • [10] CoDiSV Digital corpus of school children's written work in the Aosta Valley
    Borre, Michel
    Champvillair, Helene
    Di Rocco, Diletta
    Graziani, Stefania
    HISTORY OF EDUCATION & CHILDRENS LITERATURE, 2010, 5 (01): : 515 - 523