SUMEX: A hybrid framework for Semantic textUal siMilarity and EXplanation generation

被引：0

作者：

Saeed, Sumaira ^{[1
]}

Rajput, Quratulain ^{[1
]}

Haider, Sajjad ^{[1
]}

机构：

[1] Univ Karachi, Inst Business Adm, Artificial Intelligence Lab, Univ Rd, Karachi 75270, Pakistan

来源：

INFORMATION PROCESSING & MANAGEMENT | 2024年 / 61卷 / 05期

关键词：

Semantic Textual Similarity(STS); Explanation generation; Natural language processing; Embeddings; Clinical notes; ontology;

D O I：

10.1016/j.ipm.2024.103771

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Measuring semantic similarity between two pieces of text is a widely known problem in Natural language processing(NLP). It has many applications, such as finding similar medical notes of patients to accelerate the diagnosis process, plagiarism detection, and document clustering. Most state-of-the-art models are based on machine/deep learning and lack sufficient explanations for their results, limiting their adoption in critical domains like healthcare. This paper presents a hybrid framework SUMEX (Semantic textUal siMilarity and EXplanation generation) that uniquely combines ontology with a state-of-the-art embedding-based model for semantic textual similarity. The primary strength of the framework is that it explains its results in humanunderstandable natural language, which is vital in critical domains such as healthcare. Experiments have been conducted on two datasets of clinical notes using four embeddings: ScispaCy, BioWord2Vec, ClinicalBERT, and a customized Word2Vec trained on clinical notes. The SUMEX framework outperforms the embedding-based model on the benchmark datasets of ClinicalSTS by improving average precision scores by 7 % and reducing the false-positives-rate by 23 %. On the Patients Similarity Dataset, the average top-five and top-three precision scores were improved by 14% and 10%, respectively, using SUMEX. The SUMEX also generates explanations for its results in natural language. The domain experts evaluated the quality of the explanations. The results show that the generated explanations are of significantly good quality, with a score of 90 % and 93 % for measures of Completeness and Correctness, respectively. In addition, ChatGPT was also used for similarity score and generating explanations. The experiments show that the SUMEX framework performed better than the ChatGPT.

引用

页数：22

共 50 条

[1] FlexSTS: A Framework for Semantic Textual Similarity
Freire, Janio
Pinheiro, Vadia
Feitosa, David
LINGUAMATICA, 2016, 8 (02): : 23 - 31
[2] Sherlock: A Semi-automatic Framework for Quiz Generation Using a Hybrid Semantic Similarity Measure
Lin, Chenghua
Liu, Dong
Pang, Wei
Wang, Zhe
COGNITIVE COMPUTATION, 2015, 7 (06) : 667 - 679
[3] Sherlock: A Semi-automatic Framework for Quiz Generation Using a Hybrid Semantic Similarity Measure
Chenghua Lin
Dong Liu
Wei Pang
Zhe Wang
Cognitive Computation, 2015, 7 : 667 - 679
[4] Focusing on differences! Sample framework enhances semantic textual similarity with external knowledge
Feng, Jianzhou
Liu, Junxin
Gu, Chenghan
Qi, Haotian
Ren, Zhongcan
Xu, Kehan
Wang, Yuanzhuo
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
[5] Influence of Token Similarity Measures for Semantic Textual Similarity
Sowmya, V.
Vardhan, Vishnu B.
Raju, Bhadri M. S. V. S.
2016 IEEE 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC), 2016, : 41 - 44
[6] Semantic Textual Similarity in Bengali Text
Shajalal, Md
Aono, Masaki
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[7] Turkish Dataset for Semantic Textual Similarity
Fikri, Figen Beken
Oflazer, Kemal
Yanikoglu, Berrin
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[8] Semantic Textual Similarity in Quality Estimation
Bechara, Hanna
Parra Escartin, Carla
Orasan, Constantin
Specia, Lucia
BALTIC JOURNAL OF MODERN COMPUTING, 2016, 4 (02): : 256 - 268
[9] Linguistically Conditioned Semantic Textual Similarity
Tu, Jingxuan
Xu, Keer
Yue, Liulu
Ye, Bingyang
Rim, Kyeongmin
Pustejovsky, James
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1161 - 1172
[10] Correlation Coefficients and Semantic Textual Similarity
Zhelezniak, Vitalii
Savkov, Aleksandar
Shen, April
Hammerla, Nils Y.
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 951 - 962

← 1 2 3 4 5 →