NESM: a Named Entity based Proximity Measure for Multilingual News Clustering

被引:0
|
作者
Montalvo, Soto [1 ]
Fresno, Victor [2 ]
Martinez, Raquel [2 ]
机构
[1] Univ Rey Juan Carlos, Dpto Ciencias Computac, Madrid, Spain
[2] UNED, Dpto Lenguajes Sistemas Informdticos, Madrid, Spain
来源
关键词
Named Entity; Multilingual Clustering; Document Similarity;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Measuring the similarity between documents is an essential task in Document Clustering. This paper presents a new metric that is based on the number and the category of the Named Entities shared between news documents. Three different feature-weighting functions and two standard similarity measures were used to evaluate the quality of the proposed measure in multilingual news clustering. The results, with three di ff erent collections of comparable news written in English and Spanish, indicate that the new metric performance is in some cases better than standard similarity measures such as cosine similarity and correlation coefficient.
引用
收藏
页码:81 / 88
页数:8
相关论文
共 50 条
  • [1] Language Clustering for Multilingual Named Entity Recognition
    Shaffer, Kyle
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 40 - 45
  • [2] Multilingual news document clustering:: Two algorithms based on cognate named entities
    Montalvo, Soto
    Martinez, Raquel
    Casillas, Arantza
    Fresno, Victor
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2006, 4188 : 165 - 172
  • [3] Clustering of Multi-Word Named Entity variants: Multilingual Evaluation
    Jacquet, Guillaume
    Ehrmann, Maud
    Steinberger, Ralf
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2548 - 2553
  • [4] Theoretical Linguistics Rivals Embeddings in Language Clustering for Multilingual Named Entity Recognition
    Imai, Sakura
    Kawahara, Daisuke
    Orita, Naho
    Oda, Hiromune
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 139 - 151
  • [5] Multilingual Transformers for Named Entity Recognition
    Viksna, Rinalds
    Skadin, Inguna
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 457 - 469
  • [6] Event Graph-Based News Clustering: The Role of Named Entity-Centered Subgraphs
    Komecoglu, Basak Buluz
    Yilmaz, Burcu
    IEEE ACCESS, 2024, 12 : 105613 - 105632
  • [7] Named Entity Based Ranking with Term Proximity for XML Retrieval
    Roko, Abubakar
    Doraisamy, Shyamala
    Azman, Azreen
    Jantan, Azrul Hazri
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2018, 8 (02) : 57 - 77
  • [8] Fuzzy Named Entity-Based Document Clustering
    Cao, Tru H.
    Do, Hai T.
    Hong, Dung T.
    Quan, Tho T.
    2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 2030 - 2036
  • [9] Firefly Algorithm Based Multilingual Named Entity Recognition for Indian Languages
    Biswas, Sitanath
    Dash, Sujata
    Acharya, Sweta
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 540 - 552
  • [10] Multilingual news clustering:: Feature translation vs. identification of cognate named entities
    Montalvo, S.
    Martinez, R.
    Casillas, A.
    Fresno, V.
    PATTERN RECOGNITION LETTERS, 2007, 28 (16) : 2305 - 2311