An effective short text conceptualization based on new short text similarity

被引:6
|
作者
Bekkali, Mohammed [1 ]
Lachkar, Abdelmonaime [2 ]
机构
[1] USMBA, ENSA, LISA Lab, Fes, Morocco
[2] AEU, ENSA, Tangier, Morocco
关键词
Arabic language; Conceptualization; Word sense disambiguation; Short text similarity; Rough set theory;
D O I
10.1007/s13278-018-0544-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently short text messages, tweets, comments and so on, have become a large portion of the online text data. They are limited in length and different from traditional documents in their shortness and sparseness. As a result, short text tends to be ambiguous and its degree is not the same for all languages; and as Arabic is a very high flexional language, where a single word can have multiple meanings, the short text representation plays a vital role in any Text Mining task. To address these issues, we propose an efficient representation for short text based on concepts instead of terms using BabelNet as an external knowledge. However, in the conceptualization process, while searching polysemic term-corresponding concepts, multiple matches are detected. Therefore, assigning a term to a concept is a crucial step and we believe that short text similarity can be useful to overcome the problem of mapping term to the corresponding concept. In this paper, we reintroduce Web-based Kernel function for measuring the semantic relatedness between concepts to disambiguate an expression versus multiple concepts. The proposed method has been evaluated using an Arabic short text categorization system and the obtained results illustrate the interest of our contribution.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] A comparative study of two short text semantic similarity measures
    O'Shea, James
    Bandar, Zuhair
    Crockett, Keeley
    McLean, David
    AGENT AND MULTI-AGENT SYSTEMS: TECHNOLOGIES AND APPLICATIONS, PROCEEDINGS, 2008, 4953 : 172 - 181
  • [42] A Framework for Measuring Similarity between Terms in Short Text Categorization
    Nandini, V
    Chitra, Janani R.
    Maheswari, P. Uma
    PROCEEDINGS OF 2016 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2016,
  • [43] MEASURING SHORT TEXT SEMANTIC SIMILARITY USING MULTIPLE MEASUREMENTS
    Zhu, Tian-Tian
    Lan, Man
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 808 - 813
  • [44] A survey on the techniques, applications, and performance of short text semantic similarity
    Han, Mengting
    Zhang, Xuan
    Yuan, Xin
    Jiang, Jiahao
    Yun, Wei
    Gao, Chen
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05):
  • [45] A New Classificaiton Method for Short Text Based on SLAS and CART
    Yin, Chunyong
    Xiang, Jun
    Zhang, Hui
    Wang, Jin
    2015 FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE THEORY, SYSTEMS AND APPLICATIONS (CCITSA 2015), 2015, : 133 - 135
  • [46] A Framework for Measuring Similarity between Terms in Short Text Categorization
    Nandini, V
    Chitra, Janani R.
    Maheswari, P. Uma
    PROCEEDINGS OF 2016 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2016,
  • [47] An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
    Albitar, Shereen
    Fournier, Sebastien
    Espinasse, Bernard
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 105 - 114
  • [48] Text classification framework for short text based on TFIDF-FastText
    Chawla, Shrutika
    Kaur, Ravreet
    Aggarwal, Preeti
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40167 - 40180
  • [49] Text classification framework for short text based on TFIDF-FastText
    Shrutika Chawla
    Ravreet Kaur
    Preeti Aggarwal
    Multimedia Tools and Applications, 2023, 82 : 40167 - 40180
  • [50] Short Text Similarity Measurement Based on Coupled Semantic Relation and Strong Classification Features
    Ma, Huifang
    Liu, Wen
    Li, Zhixin
    Lin, Xianghong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 135 - 147