Knowledge-based sentence semantic similarity: algebraical properties

被引:6
|
作者
Oussalah, Mourad [1 ]
Mohamed, Muhidin [2 ]
机构
[1] Univ Oulu, CMVS, Fac Informat Technol & Elect Engn, Oulu 90014, Finland
[2] Aston Univ, Operat & Informat Management Dept, Birmingham, W Midlands, England
关键词
Sentence semantic similarity; Part-of-speech conversion; WordNet; CatVar; WORDNET;
D O I
10.1007/s13748-021-00248-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Determining the extent to which two text snippets are semantically equivalent is a well-researched topic in the areas of natural language processing, information retrieval and text summarization. The sentence-to-sentence similarity scoring is extensively used in both generic and query-based summarization of documents as a significance or a similarity indicator. Nevertheless, most of these applications utilize the concept of semantic similarity measure only as a tool, without paying importance to the inherent properties of such tools that ultimately restrict the scope and technical soundness of the underlined applications. This paper aims to contribute to fill in this gap. It investigates three popular WordNet hierarchical semantic similarity measures, namely path-length, Wu and Palmer and Leacock and Chodorow, from both algebraical and intuitive properties, highlighting their inherent limitations and theoretical constraints. We have especially examined properties related to range and scope of the semantic similarity score, incremental monotonicity evolution, monotonicity with respect to hyponymy/hypernymy relationship as well as a set of interactive properties. Extension from word semantic similarity to sentence similarity has also been investigated using a pairwise canonical extension. Properties of the underlined sentence-to-sentence similarity are examined and scrutinized. Next, to overcome inherent limitations of WordNet semantic similarity in terms of accounting for various Part-of-Speech word categories, a WordNet "All word-To-Noun conversion" that makes use of Categorial Variation Database (CatVar) is put forward and evaluated using a publicly available dataset with a comparison with some state-of-the-art methods. The finding demonstrates the feasibility of the proposal and opens up new opportunities in information retrieval and natural language processing tasks.
引用
收藏
页码:43 / 63
页数:21
相关论文
共 50 条
  • [1] Knowledge-based sentence semantic similarity: algebraical properties
    Mourad Oussalah
    Muhidin Mohamed
    Progress in Artificial Intelligence, 2022, 11 : 43 - 63
  • [2] Short Tamil Sentence Similarity Calculation using Knowledge-Based and Corpus-Based Similarity Measures
    Selvarasa, Anutharsha
    Thirunavukkarasu, Nilasini
    Rajendran, Niveathika
    Yogalingam, Chinthoorie
    Ranathunga, Surangika
    Dias, Gihan
    2017 3RD INTERNATIONAL MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2017, : 443 - 448
  • [3] KNOWLEDGE-BASED CHINESE SENTENCE RECOGNITION
    ZHENG, YC
    YUAN, BZ
    14TH INTERNATIONAL CONGRESS ON ACOUSTICS, PROCEEDINGS, VOLS 1-4, 1992, : 1113 - 1114
  • [4] A sentence similarity metric based on semantic patterns
    Lee, Ming Che
    Chang, Jia Wei
    Hsieh, Tung Cheng
    Chen, Hui Hui
    Chen, Ching Hui
    Advances in Information Sciences and Service Sciences, 2012, 4 (18): : 576 - 585
  • [5] Sentence Similarity Based on Semantic Vector Model
    Zhao Jingling
    Zhang Huiyun
    Cui Baojiang
    2014 NINTH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2014, : 499 - 503
  • [6] Eigenvalue Based Features For Semantic Sentence Similarity
    Vardasbi, Ali
    Faili, Heshaam
    Asadpour, Masoud
    2017 19TH CSI INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2017, : 184 - 189
  • [7] Computing Knowledge-Based Semantic Similarity from the Web: An Application to the Biomedical Domain
    Sanchez, David
    Batet, Montserrat
    Valls, Aida
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2009, 5914 : 17 - 28
  • [8] Improved Sentence Similarity Measurement in the Medical Field Based on Syntactico-Semantic Knowledge
    Wali, Wafa
    Gargouri, Bilel
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 890 - 899
  • [9] Knowledge-based Semantic Clustering
    Keeney, John
    Jones, Dominic
    Roblek, Dominik
    Lewis, David
    O'Sullivan, Declan
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 460 - +
  • [10] Sentence Semantic Similarity based on Word FiImbedding and WordNet
    Farouk, Mamdouh
    PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 33 - 37