Measuring semantic similarity of documents with weighted cosine and fuzzy logic

被引:4
|
作者
Huetle-Figueroa, Juan [1 ]
Perez-Tellez, Fernando [1 ]
Pinto, David [2 ]
机构
[1] Technol Univ Dublin, Dept Comp, Blessington Rd, Dublin D24 FKT9, Ireland
[2] Benemerita Univ Autonoma Puebla, PUE, Fac Comp Sci, Puebla, Mexico
关键词
Semantic similarity; semantic matching; document similarity; cosine enrichment; keyword enrichment;
D O I
10.3233/JIFS-179889
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently, the semantic analysis is used by different fields, such as information retrieval, the biomedical domain, and natural language processing. The primary focus of this research work is on using semantic methods, the cosine similarity algorithm, and fuzzy logic to improve the matching of documents. The algorithms were applied to plain texts in this case CVs (resumes) and job descriptions. Synsets of WordNet were used to enrich the semantic similarity methods such as the Wu-Palmer Similarity (WUP), Leacock-Chodorow similarity (LCH), and path similarity (hypernym/hyponym). Additionally, keyword extraction was used to create a postings list where keywords were weighted. The task of recruiting new personnel in the companies that publish job descriptions and reciprocally finding a company when workers publish their resumes is discussed in this research work. The creation of a new gold standard was required to achieve a comparison of the proposed methods. A web application was designed to match the documents manually, creating the new gold standard. Thereby the new gold standard confirming benefits of enriching the cosine algorithm semantically. Finally, the results were compared with the new gold standard to check the efficiency of the new methods proposed. The measures used for the analysis were precision, recall, and f-measure, concluding that the cosine similarity weighted semantically can be used to get better similarity scores.
引用
收藏
页码:2263 / 2278
页数:16
相关论文
共 50 条
  • [1] A Taxonomy based Semantic Similarity of Documents using the Cosine Measure
    Madylova, Ainura
    Oguducu, Sule Guenduez
    2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 129 - 134
  • [2] Intuitionistic Fuzzy Ordered Weighted Cosine Similarity Measure
    Zhou, Ligang
    Tao, Zhifu
    Chen, Huayou
    Liu, Jinpei
    GROUP DECISION AND NEGOTIATION, 2014, 23 (04) : 879 - 900
  • [3] Intuitionistic Fuzzy Ordered Weighted Cosine Similarity Measure
    Ligang Zhou
    Zhifu Tao
    Huayou Chen
    Jinpei Liu
    Group Decision and Negotiation, 2014, 23 : 879 - 900
  • [4] Fuzzy lattice neurocomputing using weighted cosine similarity measure
    Cripps, Al
    Nguyen, Nghiep
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 236 - +
  • [5] A methodology for measuring structure similarity of fuzzy XML documents
    Zhen Zhao
    Zongmin Ma
    Computing, 2017, 99 : 493 - 506
  • [6] A methodology for measuring structure similarity of fuzzy XML documents
    Zhao, Zhen
    Ma, Zongmin
    COMPUTING, 2017, 99 (05) : 493 - 506
  • [7] Measuring Semantic Similarity between Words Using Web Documents
    Takale, Sheetal A.
    Nandgaonkar, Sushma S.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2010, 1 (04) : 78 - 85
  • [8] Generation of Frequent Itemset Using Fuzzy Weighted Tree with Cosine Similarity
    Satoto, Budi Dwi
    ADVANCED SCIENCE LETTERS, 2017, 23 (12) : 12354 - 12358
  • [9] A fuzzy approach for measuring the semantic similarity between words in WordNet
    Song, Ling
    Ma, Jun
    Lei, Jingsheng
    Li, Chao
    Journal of Information and Computational Science, 2009, 6 (03): : 1673 - 1680
  • [10] A multilingual fuzzy approach for classifying Twitter data using fuzzy logic and semantic similarity
    Madani, Youness
    Erritali, Mohammed
    Bengourram, Jamaa
    Sailhan, Francoise
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (12): : 8655 - 8673