An Efficient Approach for Findings Document Similarity Using Optimized Word Mover's Distance

被引:0
|
作者
Dey, Atanu [1 ]
Jenamani, Mamata [1 ]
De, Arijit [2 ]
机构
[1] Indian Inst Technol Kharagpur, Kharagpur, India
[2] Univ Manchester, Manchester, England
关键词
Word embedding; Document distance; Contextual similarity; Document similarity; Word mover's distance; NLP Optimization;
D O I
10.1007/978-3-031-45170-6_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Optimized Word Mover's Distance (OWMD), a similarity function that compares two sentences based on their word embeddings. The method determines the degree of semantic similarity between two sentences considering their interdependent representations. Within a sentence, all the words may not be relevant for determining contextual similarity at the aspect level with another sentence. To account for this fact, we designed OWMD in two ways: first, it decreases system's complexity by selecting words from the sentence pair according to a predefined set of dependency parsing criteria; Second, it applies the word mover's distance (WMD) method to previously chosen words. When comparing the dissimilarity of two text sentences, the WMD method is used because it represents the minimal "journey time" required for the embedded words of one sentence to reach the embedded words of another sentence. Finally, adding an exponent function to the inverse of the OWMD dissimilarity score yields the resulting similarity score, called Optimized Word Mover's Similarity (OWMS). Using STSb-Multi-MT dataset, the OWMS measure decreases MSE, RMSE, and MAD error rates by 66.66%, 40.70%, and 37.93% respectively than previous approaches. Again, OWMS reduces MSE, RMSE, and MAD error rates on Semantic Textual Similarity (STS) dataset by 85.71%, 62.32%, and 60.17% respectively. For STSb-Multi-MT and STS datasets, the suggested strategy reduces run-time complexity by 33.54% and 49.43%, respectively, compared to the best of existing approaches.
引用
收藏
页码:3 / 11
页数:9
相关论文
共 50 条
  • [31] Using Earth Mover's Distance and Word Embeddings for Recognizing Textual Entailment in Arabic
    Boudaa, Tarik
    El Marouani, Mohamed
    Enneya, Nourddine
    COMPUTACION Y SISTEMAS, 2020, 24 (04): : 1499 - 1508
  • [32] Retrieving Compositional Documents Using Position-Sensitive Word Mover's Distance
    Trapp, Martin
    Skowron, Marcin
    Schabus, Dietmar
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 233 - 236
  • [33] Word Mover's Embedding: From Word2Vec to Document Embedding
    Wu, Lingfei
    Yen, Ian En-Hsu
    Xu, Kun
    Xu, Fangli
    Balakrishnan, Avinash
    Chen, Pin-Yu
    Ravikumar, Pradeep
    Witbrock, Michael J.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4524 - 4534
  • [34] Similarity in spatial utilization distributions measured by the earth mover's distance
    Kranstauber, Bart
    Smolla, Marco
    Safi, Kamran
    METHODS IN ECOLOGY AND EVOLUTION, 2017, 8 (02): : 155 - 160
  • [35] On the earth mover's distance as a histogram similarity metric for image retrieval
    Yu, ZH
    Herman, G
    2005 IEEE International Conference on Multimedia and Expo (ICME), Vols 1 and 2, 2005, : 686 - 689
  • [36] Emotion computing using Word Mover's Distance features based on Ren_CECps
    Ren, Fuji
    Liu, Ning
    PLOS ONE, 2018, 13 (04):
  • [37] Improve Word Mover's Distance with Part-of-Speech Tagging
    Chen, Xiaojun
    Bai, Li
    Wang, Dakui
    Shi, Jinqiao
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3722 - 3728
  • [38] Network Log Analysis based on the Topic Word Mover's Distance
    Chen, Renai
    Gao, Qing
    Ji, Weiliang
    Long, Fei
    Ling, Qiang
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 4082 - 4086
  • [39] A Progressive Approach for Computing the Earth Mover's Distance
    Wu, Jiacheng
    Zhang, Yong
    Chen, Yu
    Xing, Chunxiao
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 122 - 138
  • [40] Word Semantic Similarity based on document's title
    Hamani, Mohamed Said
    Maamri, Ramdane
    2013 24TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2013), 2013, : 43 - 47