NESM: a Named Entity based Proximity Measure for Multilingual News Clustering

被引:0
|
作者
Montalvo, Soto [1 ]
Fresno, Victor [2 ]
Martinez, Raquel [2 ]
机构
[1] Univ Rey Juan Carlos, Dpto Ciencias Computac, Madrid, Spain
[2] UNED, Dpto Lenguajes Sistemas Informdticos, Madrid, Spain
来源
关键词
Named Entity; Multilingual Clustering; Document Similarity;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Measuring the similarity between documents is an essential task in Document Clustering. This paper presents a new metric that is based on the number and the category of the Named Entities shared between news documents. Three different feature-weighting functions and two standard similarity measures were used to evaluate the quality of the proposed measure in multilingual news clustering. The results, with three di ff erent collections of comparable news written in English and Spanish, indicate that the new metric performance is in some cases better than standard similarity measures such as cosine similarity and correlation coefficient.
引用
收藏
页码:81 / 88
页数:8
相关论文
共 50 条
  • [21] Chinese Named Entity Recognition of Geological News Based on BERT Model
    Huang, Chao
    Wang, Yuzhu
    Yu, Yuqing
    Hao, Yujia
    Liu, Yuebin
    Zhao, Xiujian
    APPLIED SCIENCES-BASEL, 2022, 12 (15):
  • [22] A web-based Bengali news corpus for named entity recognition
    Asif Ekbal
    Sivaji Bandyopadhyay
    Language Resources and Evaluation, 2008, 42 : 173 - 182
  • [23] A knowledge-based approach to named entity disambiguation in news articles
    Nguyen, Hien T.
    Cao, Tru H.
    AI 2007: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4830 : 619 - +
  • [24] A Finnish news corpus for named entity recognition
    Teemu Ruokolainen
    Pekka Kauppinen
    Miikka Silfverberg
    Krister Lindén
    Language Resources and Evaluation, 2020, 54 : 247 - 272
  • [25] Uzbek news corpus for named entity recognition
    Yusufu, Aizihaierjiang
    Aziz, Kamran
    Yusufu, Aizierguli
    Ainiwaer, Abidan
    Li, Fei
    Ji, Donghong
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [26] IdentityRank: Named entity disambiguation in the news domain
    Fernandez, Norberto
    Arias Fisteus, Jesus
    Sanchez, Luis
    Lopez, Gonzalo
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) : 9207 - 9221
  • [27] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [28] Suffix Tree Clustering with Named Entity Recognition
    Zhang, Jiwei
    Dang, Qiuyue
    Lu, Yueming
    Sun, Songlin
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 549 - 556
  • [29] Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts
    Cabot, Chloe
    Darmoni, Stefan
    Soualmia, Lina F.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2019, 94
  • [30] Agglutinative Languages Named Entity Recognition Based on Pruner and Multilingual Fine-Tuning
    Kai’ang, Luo
    Halidanmu, Abudukelimu
    Chang, Liu
    Abulizi, Abudukelimu
    Wenqiang, Guo
    Computer Engineering and Applications, 2023, 59 (24) : 121 - 130