On the fractal patterns of language structures

被引:2
|
作者
Ribeiro, Leonardo Costa [1 ]
Bernardes, Americo Tristao [2 ]
Mello, Heliana [3 ]
机构
[1] Univ Fed Minas Gerais, Fac Ciencias Econ, Dept Ciencias Econ, Belo Horizonte, MG, Brazil
[2] Univ Fed Ouro Preto, Dept Fis, Inst Ciencias Exatas & Biol, Ouro Preto, MG, Brazil
[3] Univ Fed Minas Gerais, Fac Letras, Belo Horizonte, MG, Brazil
来源
PLOS ONE | 2023年 / 18卷 / 05期
关键词
DIVERSITY; LAW;
D O I
10.1371/journal.pone.0285630
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Natural Language Processing (NLP) makes use of Artificial Intelligence algorithms to extract meaningful information from unstructured texts, i.e., content that lacks metadata and cannot easily be indexed or mapped onto standard database fields. It has several applications, from sentiment analysis and text summary to automatic language translation. In this work, we use NLP to figure out similar structural linguistic patterns among several different languages. We apply the word2vec algorithm that creates a vector representation for the words in a multidimensional space that maintains the meaning relationship between the words. From a large corpus we built this vectorial representation in a 100-dimensional space for English, Portuguese, German, Spanish, Russian, French, Chinese, Japanese, Korean, Italian, Arabic, Hebrew, Basque, Dutch, Swedish, Finnish, and Estonian. Then, we calculated the fractal dimensions of the structure that represents each language. The structures are multi-fractals with two different dimensions that we use, in addition to the token-dictionary size rate of the languages, to represent the languages in a three-dimensional space. Finally, analyzing the distance among languages in this space, we conclude that the closeness there is tendentially related to the distance in the Phylogenetic tree that depicts the lines of evolutionary descent of the languages from a common ancestor.
引用
收藏
页数:20
相关论文
共 50 条