SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引:0
|
作者
Saeed, Sumaira [1 ]
Rajput, Quratulain [1 ]
Haider, Sajjad [1 ]
机构
[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan
关键词
Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;
D O I
10.1016/j.ipm.2024.103771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Exploiting Syntactic and Semantic Information for Textual Similarity Estimation
    Luo, Jiajia
    Shan, Hongtao
    Zhang, Gaoyu
    Yuan, George
    Zhang, Shuyi
    Yan, Fengting
    Li, Zhiwei
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [42] UESTS: An Unsupervised Ensemble Semantic Textual Similarity Method
    Hassan, Basma
    Abdelrahman, Samir E.
    Bahgat, Reem
    Farag, Ibrahim
    IEEE ACCESS, 2019, 7 : 85462 - 85482
  • [43] A Combination of Enhanced WordNet and BERT for Semantic Textual Similarity
    Ramaiah Institute of Technology, India
    不详
    ACM Int. Conf. Proc. Ser., (191-198):
  • [44] Fine-grained Semantic Textual Similarity for Serbian
    Batanovic, Vuk
    Cvetanovic, Milos
    Nikolic, Bosko
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1370 - 1378
  • [45] A semantic textual similarity measurement model based on the syntactic-semantic representation
    Tang, Zhuo
    Xiao, Qi
    Zhu, Li
    Li, Kenli
    Li, Keqin
    INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 933 - 950
  • [46] Spectral Learning of Semantic Units in a Sentence Pair to Evaluate Semantic Textual Similarity
    Mehndiratta, Akanksha
    Asawa, Krishna
    8TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS, BDA 2020, 2020, 12581 : 49 - 59
  • [47] Supporting Customer Choice with Semantic Similarity Search and Explanation
    Formica, Anna
    Missikoff, Michele
    Pourabbas, Elaheh
    Taglino, Francesco
    ADVANCED INFORMATION SYSTEMS ENGINEERING WORKSHOPS (CAISE), 2013, 148 : 317 - 328
  • [48] A Semantic Logic-Based Approach to Determine Textual Similarity
    Blanco, Eduardo
    Moldovan, Dan
    IEEE Transactions on Audio, Speech and Language Processing, 2015, 23 (04): : 683 - 693
  • [49] Enhancing inter-sentence attention for Semantic Textual Similarity
    Zhao, Ying
    Xia, Tingyu
    Jiang, Yunqi
    Tian, Yuan
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [50] Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan
    Felbur, Rafal
    Meelen, Marieke
    Vierthaler, Paul
    JOURNAL OF OPEN HUMANITIES DATA, 2022, 8