Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity

被引:3
|
作者
Mahmoud, Adnen [1 ]
Zrigui, Mounir [2 ]
机构
[1] Higher Inst Comp Sci & Commun Tech, Monastir, Tunisia
[2] Fac Sci Monastir, Monastir, Tunisia
关键词
Arabic Language; Context Based Approach; Global Vectors Representation; Natural Language Processing; Paraphrase Detection; Semantic Similarity; Word Embedding; Word2vec;
D O I
10.4018/IJCINI.2020010103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem addressed is to develop a model that can reliably identify whether a previously unseen document pair is paraphrased or not. Its detection in Arabic documents is a challenge because of its variability in features and the lack of publicly available corpora. Faced with these problems, the authors propose a semantic approach. At the feature extraction level, the authors use global vectors representation combining global co-occurrence counting and a contextual skip gram model. At the paraphrase identification level, the authors apply a convolutional neural network model to learn more contextual and semantic information between documents. For experiments, the authors use Open Source Arabic Corpora as a source corpus. Then the authors collect different datasets to create a vocabulary model. For the paraphrased corpus construction, the authors replace each word from the source corpus by its most similar one which has the same grammatical class applying the word2vec algorithm and the part-of-speech annotation. Experiments show that the model achieves promising results in terms of precision and recall compared to existing approaches in the literature.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 50 条
  • [1] Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language
    Mahmoud, Adnen
    Zrigui, Mounir
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9263 - 9274
  • [2] Sentence Embedding and Convolutional Neural Network for Semantic Textual Similarity Detection in Arabic Language
    Adnen Mahmoud
    Mounir Zrigui
    Arabian Journal for Science and Engineering, 2019, 44 : 9263 - 9274
  • [3] A Fuzzy Multigranularity Convolutional Neural Network With Double Attention Mechanisms for Measuring Semantic Textual Similarity
    Zhao, Butian
    Zhang, Runtong
    Bai, Kaiyuan
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2024, 32 (10) : 5762 - 5776
  • [4] Detection of medical text semantic similarity based on convolutional neural network
    Zheng, Tao
    Gao, Yimei
    Wang, Fei
    Fan, Chenhao
    Fu, Xingzhi
    Li, Mei
    Zhang, Ya
    Zhang, Shaodian
    Ma, Handong
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (01)
  • [5] Convolutional Network-Based Semantic Similarity Model of Sentences
    Huang J.-P.
    Ji D.-H.
    2017, South China University of Technology (45): : 68 - 75
  • [6] Detection of medical text semantic similarity based on convolutional neural network
    Tao Zheng
    Yimei Gao
    Fei Wang
    Chenhao Fan
    Xingzhi Fu
    Mei Li
    Ya Zhang
    Shaodian Zhang
    Handong Ma
    BMC Medical Informatics and Decision Making, 19
  • [7] Sentence Semantic Similarity Model Using Convolutional Neural Networks
    Karthiga M.
    Sountharrajan S.
    Suganya E.
    Sankarananth S.
    EAI Endorsed Transactions on Energy Web, 2021, 8 (35) : 1 - 6
  • [8] A semantic textual similarity measurement model based on the syntactic-semantic representation
    Tang, Zhuo
    Xiao, Qi
    Zhu, Li
    Li, Kenli
    Li, Keqin
    INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 933 - 950
  • [9] Security Enhanced Sentence Similarity Computing Model Based on Convolutional Neural Network
    Sun, Qifeng
    Huang, Xingzhe
    Kibalya, Godfrey
    Kumar, Neeraj
    Kumar, Santhosh S. V. N.
    Zhang, Peiying
    Xie, Dongliang
    IEEE ACCESS, 2021, 9 (09): : 104183 - 104196
  • [10] A novel sentence similarity model with word embedding based on convolutional neural network
    Yao, Haipeng
    Liu, Huiwen
    Zhang, Peiying
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):