SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引:0
|
作者
Saeed, Sumaira [1 ]
Rajput, Quratulain [1 ]
Haider, Sajjad [1 ]
机构
[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan
关键词
Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;
D O I
10.1016/j.ipm.2024.103771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] A semantic framework for textual data enrichment
    Gutierrez, Yoan
    Vazquez, Sonia
    Montoyo, Andres
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 57 : 248 - 269
  • [32] Question Similarity Detection in Turkish Using Semantic Textual Similarity Methods
    Yildiz, Eray
    Findik, Yasin
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [33] Semantic Textual Relatedness: A Hybrid Method
    Razandi, Muhammad Fauzan
    Bijaksana, Moch Arif
    Junaedi, Danang
    Selfiendi, Eldita Febrian
    Permadi, Rakhmad Indra
    2016 4TH INTERNATIONAL CONFERENCE ON CYBER AND IT SERVICE MANAGEMENT, 2016, : 177 - 180
  • [34] A Multi-Layer System for Semantic Textual Similarity
    Ngoc Phuoc An Vo
    Popescu, Octavian
    KDIR: PROCEEDINGS OF THE 8TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT - VOL. 1, 2016, : 56 - 67
  • [35] Gradually Improving the Computation of Semantic Textual Similarity in Portuguese
    Oliveira, Hugo Goncalo
    Alves, Ana Oliveira
    Rodrigues, Ricardo
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017), 2017, 10423 : 841 - 854
  • [36] Semantic Textual Similarity Methods, Tools, and Applications: A Survey
    Majumder, Goutam
    Pakray, Partha
    Gelbukh, Alexander
    Pinto, David
    COMPUTACION Y SISTEMAS, 2016, 20 (04): : 647 - 665
  • [37] A proposal for annotation, semantic similarity and classification of textual documents
    Nauer, Emmanuel
    Napoli, Amedeo
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2006, 4183 : 201 - 212
  • [38] Evaluating Multimodal Representations on Visual Semantic Textual Similarity
    de Lacalle, Oier Lopez
    Salaberria, Ander
    Soroa, Aitor
    Azkune, Gorka
    Agirre, Eneko
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1990 - 1997
  • [39] Calculation of Textual Similarity Using Semantic Relatedness Functions
    Kairaldeen, Ammar Riadh
    Ercan, Gonenc
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 516 - 524
  • [40] C-STS: Conditional Semantic Textual Similarity
    Deshpande, Ameet
    Jimenez, Carlos E.
    Chen, Howard
    Murahari, Vishvak
    Graf, Victoria
    Rajpurohit, Tanmay
    Kalyan, Ashwin
    Chen, Danqi
    Narasimhan, Karthik
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5669 - 5690