An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters

被引：0

作者：

Ruan, Chunyan ^{[1
]}

Qu, Wen ^{[2
]}

Luo, Jianfeng ^{[3
]}

Lu, Kuan-Han ^{[4
]}

机构：

[1] Dongguan City Coll, Sch Foreign Languages, Dongguan 523419, Guangdong, Peoples R China

[2] Gannan Univ Sci & Technol, Dept Informat Engn, Ganzhou 341000, Jiangxi, Peoples R China

[3] Dongguan Polytech, Dept Comp Engn, Dongguan 523808, Guangdong, Peoples R China

[4] Soochow Univ, Management, Comp Sci & Informat, Taipei 11490, Taiwan

来源：

JOURNAL OF SUPERCOMPUTING | 2022年 / 78卷 / 14期

关键词：

Near-form words; Letter coding; Code arrangement sequence; Different letter position coefficient; Different letter appearance coefficient; Counterfeit commodity name;

D O I：

10.1007/s11227-022-04511-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The concept of "near-form words" has existed since the ancient English period (about 450 years ago), yet few mathematical identification algorithms have been applied to these. With the widespread use of English and an increasing number of English words, near-form words have also increased. However, the traditional way of identifying near-form words cannot keep up with this ever-growing language. A mathematical algorithm is therefore needed which can calculate the degree of similarity between words, so that near-form words can then be identified, collected and classified according to appearance similarity, and a specific value can be assigned to these levels of similarity. In related fields, there have been many studies of English synonyms, phonetic words, English sentences and texts. Some algorithms have been used with the aim of studying similarities in word appearance, but these were for hieroglyphics, such as Chinese words, and not for English words. Many similar words can be found in dictionaries or networks which are incomplete due to the outcomes of subjective collection. More importantly, subjective collection methods cannot determine the value of similarities, which highlights the uniqueness and innovation of this research. Among existing research methods, the one used most often involves fuzzy neural networks, which are unstable and inaccurate. A stable and unique mathematical calculation method is therefore needed. In this study, coding methods were used to design an algorithm that could calculate different letter position coefficients and letter appearance coefficients in order to obtain corresponding values. In terms of application, this algorithm can help generate big data on near-form words in English teaching. In terms of English input software, this algorithm can also provide more words to prompt the input method. In the case of text-editing software (such as Microsoft Word), the algorithm can improve error-detection accuracy and suggest suitable alternatives. In the field of artificial intelligence, it can also be used to monitor counterfeit trademark registration in the commodity registration system. Thus, the authors firmly believe that this application will have a wide range of applications in the future.

引用

页码：15974 / 15994

页数：21