SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引:0
|
作者
Saeed, Sumaira [1 ]
Rajput, Quratulain [1 ]
Haider, Sajjad [1 ]
机构
[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan
关键词
Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;
D O I
10.1016/j.ipm.2024.103771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Efficient Textual Similarity using Semantic MinHashing
    Nawaz, Waqas
    Baig, Maryam
    Khan, Kifayat Ullah
    2024 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, IEEE BIGCOMP 2024, 2024, : 262 - 269
  • [22] MedSTS: a resource for clinical semantic textual similarity
    Yanshan Wang
    Naveed Afzal
    Sunyang Fu
    Liwei Wang
    Feichen Shen
    Majid Rastegar-Mojarad
    Hongfang Liu
    Language Resources and Evaluation, 2020, 54 : 57 - 72
  • [23] Interpretable Semantic Textual Similarity for Indonesian Sentence
    Rajagukguk, Rio Chandra
    Khodra, Masayu Leylia
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 147 - 152
  • [24] Textual entailment beyond semantic similarity information
    Vazquez, Sonia
    Kozareva, Zornitsa
    Montoyo, Andres
    MICAI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4293 : 900 - +
  • [25] Collective Human Opinions in Semantic Textual Similarity
    Wang, Yuxia
    Tao, Shimin
    Xie, Ning
    Yang, Hao
    Baldwin, Timothy
    Verspoor, Karin
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 997 - 1013
  • [26] Semantic Textual Similarity Using Various Approaches
    Kazula, Maciej
    Kozlowski, Marek
    MACHINE INTELLIGENCE AND BIG DATA IN INDUSTRY, 2016, 19 : 49 - 62
  • [27] Linking Datasets Using Semantic Textual Similarity
    McCrae, John P.
    Buitelaar, Paul
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2018, 18 (01) : 109 - 123
  • [28] Czech news dataset for semantic textual similarity
    Sido, Jakub
    Sejak, Michal
    Prazak, Ondrej
    Konopik, Miloslav
    Moravec, Vaclav
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [29] A Quantum Genetic Algorithm for Building a Semantic Textual Similarity Estimation Framework for Plagiarism Detection Applications
    Darwish, Saad M.
    Mhaimeed, Ibrahim Abdullah
    Elzoghabi, Adel A.
    ENTROPY, 2023, 25 (09)
  • [30] A Random Walk Framework to Compute Textual Semantic Similarity: a Unified Model for Three Benchmark Tasks
    Yazdani, Majid
    Popescu-Belis, Andrei
    2010 IEEE FOURTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2010), 2010, : 424 - 429