NESM: a Named Entity based Proximity Measure for Multilingual News Clustering

被引:0
|
作者
Montalvo, Soto [1 ]
Fresno, Victor [2 ]
Martinez, Raquel [2 ]
机构
[1] Univ Rey Juan Carlos, Dpto Ciencias Computac, Madrid, Spain
[2] UNED, Dpto Lenguajes Sistemas Informdticos, Madrid, Spain
来源
关键词
Named Entity; Multilingual Clustering; Document Similarity;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Measuring the similarity between documents is an essential task in Document Clustering. This paper presents a new metric that is based on the number and the category of the Named Entities shared between news documents. Three different feature-weighting functions and two standard similarity measures were used to evaluate the quality of the proposed measure in multilingual news clustering. The results, with three di ff erent collections of comparable news written in English and Spanish, indicate that the new metric performance is in some cases better than standard similarity measures such as cosine similarity and correlation coefficient.
引用
收藏
页码:81 / 88
页数:8
相关论文
共 50 条
  • [41] Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
    de Lima Santos, Diego Bernardes
    de Carvalho Dutra, Frederico Giffoni
    Parreiras, Fernando Silva
    Brandao, Wladmir Cardoso
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 473 - 483
  • [42] TLR at BSNLP2019: A Multilingual Named Entity Recognition System
    Moreno, Jose G.
    Pontes, Elvys Linhares
    Coustaty, Mickael
    Doucet, Antoine
    7TH WORKSHOP ON BALTO-SLAVIC NATURAL LANGUAGE PROCESSING (BSNLP'2019), 2019, : 83 - 88
  • [43] Exploiting Named Entities for Bilingual News Clustering
    Montalvo, Soto
    Martinez, Raquel
    Fresno, Victor
    Delgado, Agustin
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (02) : 363 - 376
  • [44] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
    Khairunnisa, Siti Oryza
    Chen, Zhousi
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [45] Grid-based dynamic clustering with grid proximity measure
    Lee, Gun Ho
    INTELLIGENT DATA ANALYSIS, 2016, 20 (04) : 853 - 875
  • [46] Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration
    Wentland, Wolodja
    Knopp, Johannes
    Silberer, Carina
    Hartung, Matthias
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3230 - 3237
  • [47] Named Entity Recommendations to Enhance Multilingual Retrieval in Europeana.eu
    Gordea, Sergiu
    Paramita, Monica Lestari
    Isaac, Antoine
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 102 - 112
  • [48] Prioritized Named Entity Driven LDA for Document Clustering
    Kumar, Durgesh
    Singh, Sanasam Ranbir
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2019, PT II, 2019, 11942 : 294 - 301
  • [49] Named Entity Recognition an Aid to Improve Multilingual Entity Filling In Language-Independent Approach
    Bhagavatula, Mahathi
    Santosh, G. S. K.
    Varma, Vasudeva
    PROCEEDINGS OF THE FIRST WORKSHOP ON INFORMATION AND KNOWLEDGE MANAGEMENT FOR DEVELOPING REGION, 2012, : 3 - 9
  • [50] Chinese Named Entity Recognition in the Ship News Field Based on Adversarial Transfer Learning
    Zhu, Zhihong
    Zhang, Weiwen
    Zhang, Hongbin
    Cheng, Lianglun
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 562 - 567