An effective short text conceptualization based on new short text similarity

被引:6
|
作者
Bekkali, Mohammed [1 ]
Lachkar, Abdelmonaime [2 ]
机构
[1] USMBA, ENSA, LISA Lab, Fes, Morocco
[2] AEU, ENSA, Tangier, Morocco
关键词
Arabic language; Conceptualization; Word sense disambiguation; Short text similarity; Rough set theory;
D O I
10.1007/s13278-018-0544-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently short text messages, tweets, comments and so on, have become a large portion of the online text data. They are limited in length and different from traditional documents in their shortness and sparseness. As a result, short text tends to be ambiguous and its degree is not the same for all languages; and as Arabic is a very high flexional language, where a single word can have multiple meanings, the short text representation plays a vital role in any Text Mining task. To address these issues, we propose an efficient representation for short text based on concepts instead of terms using BabelNet as an external knowledge. However, in the conceptualization process, while searching polysemic term-corresponding concepts, multiple matches are detected. Therefore, assigning a term to a concept is a crucial step and we believe that short text similarity can be useful to overcome the problem of mapping term to the corresponding concept. In this paper, we reintroduce Web-based Kernel function for measuring the semantic relatedness between concepts to disambiguate an expression versus multiple concepts. The proposed method has been evaluated using an Arabic short text categorization system and the obtained results illustrate the interest of our contribution.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Consensus Similarity Measure for Short Text Clustering
    Shin, Youhyun
    Ahn, Yeonchan
    Jeon, Heesik
    Lee, Sang-goo
    2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 264 - 268
  • [22] Short Text Semantic Similarity Measurement Approach Based on Semantic Network
    Hameed, Naamah Hussien
    Alimi, Adel M.
    Sadiq, Ahmed T.
    BAGHDAD SCIENCE JOURNAL, 2022, 19 (06) : 1581 - 1591
  • [23] Granularity-Based Assessment of Similarity Between Short Text Strings
    Kaur, Harpreet
    Maini, Raman
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING AND COMMUNICATION SYSTEMS, MCCS 2018, 2019, 556 : 91 - 107
  • [24] NGram Approach for Semantic Similarity on Arabic Short Text
    Al-Mahmoud, Rana Husni
    Sharieh, Ahmad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 857 - 866
  • [25] A Fast and Efficient Semantic Short Text Similarity Metric
    Croft, David
    Coupland, Simon
    Shell, Jethro
    Brown, Stephen
    2013 13TH UK WORKSHOP ON COMPUTATIONAL INTELLIGENCE (UKCI), 2013, : 221 - 227
  • [26] Short Text Similarity Calculation Using Semantic Information
    Pu, Haoyu
    Fei, Gaolei
    Zhao, Hailin
    Hu, Guangmin
    Jiao, Chengbo
    Xu, Zhoujun
    2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 144 - 150
  • [27] Improving Short Text Clustering by Similarity Matrix Sparsification
    Rakib, Md Rashadul Hasan
    Jankowska, Magdalena
    Zeh, Norbert
    Milios, Evangelos
    PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [28] Mining Summary of Short Text with Centroid Similarity Distance
    Franciscus, Nigel
    Wang, Junhu
    Stantic, Bela
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2019, 2019, 11888 : 447 - 461
  • [29] Towards Effective Short Text Deep Classification
    Sun, Xinruo
    Wang, Haofen
    Yu, Yong
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 1143 - 1144
  • [30] A new benchmark dataset with production methodology for short text semantic similarity algorithms
    O'shea, James
    Bandar, Zuhair
    Crockett, Keeley
    ACM Transactions on Speech and Language Processing, 2013, 10 (04):