Analyzing the Adequacy of Readability Indicators to a Non-English Language

被引:2
|
作者
Antunes, Helder [1 ]
Lopes, Carla Teixeira [1 ,2 ]
机构
[1] Univ Porto, Fac Engn, Porto, Portugal
[2] INESC TEC, Porto, Portugal
关键词
Readability; Portuguese language; Text simplification; Natural language processing;
D O I
10.1007/978-3-030-28577-7_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Readability is a linguistic feature that indicates how difficult it is to read a text. Traditional readability formulas were made for the English language. This study evaluates their adequacy to the Portuguese language. We applied the traditional formulas in 10 parallel corpora. We verified that the Portuguese language had higher grade scores (less readability) in the formulas that use the number of syllables per words or number of complex words per sentence. Formulas that use letters by words instead of syllables by words output similar grade scores. Considering this, we evaluated the correlation of the complex words in 65 Portuguese school books of 12 schooling years. We found out that the concept of complex word as a word with 4 or more syllables, instead of 3 or more syllables as originally used in traditional formulas applied to English texts, is more correlated with the grade of Portuguese school books. In the end, for each traditional readability formula, we adapted it to the Portuguese language performing a multiple linear regression in the same dataset of school books.
引用
收藏
页码:149 / 155
页数:7
相关论文
共 50 条