Semantic textual similarity between sentences using bilingual word semantics

被引:21
|
作者
Shajalal, Md [1 ]
Aono, Masaki [2 ]
机构
[1] Bangladesh Agr Univ, Dept Comp Sci & Math, Mymensingh 2202, Bangladesh
[2] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi, Aichi, Japan
基金
日本学术振兴会;
关键词
Semantic similarity; Word semantics; Word-embedding; Textual similarity; Bilingual semantics;
D O I
10.1007/s13748-019-00180-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic textual similarity between sentences is indispensable for many information retrieval tasks. Traditional lexical similarity measures cannot compute the similarity beyond a trivial level. Moreover, they only can capture the textual similarity, but not semantic. In this paper, we propose a method for semantic textual similarity that leverages bilingual word-level semantics to compute the semantic similarity between sentences. To capture word-level semantics, we employ distribute representation of words in two different languages. The similarity function based on the concept-to-concept relationship corresponding to the words is also utilized for the same purpose. Multiple new semantic similarity measures are introduced based on word-embedding models trained on two different corpora in two different languages. Apart from these, another new semantic similarity measure is also introduced using the word sense comparison. The similarity score between the sentences is then computed by applying a linear ranking approach to all proposed measures with their importance score estimated employing a supervised feature selection technique. We conducted experiments on the SemEval Semantic Textual Similarity (STS-2017) test collections. The experimental results demonstrated that our method is effective for measuring semantic textual similarity and outperforms some known related methods.
引用
收藏
页码:263 / 272
页数:10
相关论文
共 50 条
  • [31] Not Just a Matter of Semantics: The Relationship Between Visual and Semantic Similarity
    Brust, Clemens-Alexander
    Denzler, Joachim
    PATTERN RECOGNITION, DAGM GCPR 2019, 2019, 11824 : 414 - 427
  • [32] Using similarity scoring to improve the bilingual dictionary for word alignment
    Probst, K
    Brown, R
    40TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2002, : 409 - 416
  • [33] Influence of Token Similarity Measures for Semantic Textual Similarity
    Sowmya, V.
    Vardhan, Vishnu B.
    Raju, Bhadri M. S. V. S.
    2016 IEEE 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (IACC), 2016, : 41 - 44
  • [34] FlexSTS: A Framework for Semantic Textual Similarity
    Freire, Janio
    Pinheiro, Vadia
    Feitosa, David
    LINGUAMATICA, 2016, 8 (02): : 23 - 31
  • [35] Semantic Textual Similarity in Bengali Text
    Shajalal, Md
    Aono, Masaki
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [36] Turkish Dataset for Semantic Textual Similarity
    Fikri, Figen Beken
    Oflazer, Kemal
    Yanikoglu, Berrin
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [37] Semantic Textual Similarity in Quality Estimation
    Bechara, Hanna
    Parra Escartin, Carla
    Orasan, Constantin
    Specia, Lucia
    BALTIC JOURNAL OF MODERN COMPUTING, 2016, 4 (02): : 256 - 268
  • [38] Linguistically Conditioned Semantic Textual Similarity
    Tu, Jingxuan
    Xu, Keer
    Yue, Liulu
    Ye, Bingyang
    Rim, Kyeongmin
    Pustejovsky, James
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1161 - 1172
  • [39] Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences
    Nagoudi, El Moatez Billah
    Ferrero, Jeremy
    Schwab, Didier
    Cherroun, Hadda
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 19 - 33
  • [40] Correlation Coefficients and Semantic Textual Similarity
    Zhelezniak, Vitalii
    Savkov, Aleksandar
    Shen, April
    Hammerla, Nils Y.
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 951 - 962