The Role of Transliterated Words in Linking Bilingual News Articles in an Archive

被引:1
|
作者
Khan, Muzammil [1 ]
Khan, Sarwar Shah [1 ]
Alharbi, Yasser [2 ]
Alferaidi, Ali [2 ]
Alharbi, Talal Saad [2 ]
Yadav, Kusum [2 ]
机构
[1] Univ Swat, Dept Comp & Software Technol, Mingora 19130, Pakistan
[2] Univ Hail, Coll Comp Sci & Engn, Hail 55473, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期
关键词
transliterated words; news archiving; news linking; dual lingual archive; digital libraries; similarity measure; RECOMMENDATION;
D O I
10.3390/app13074435
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Retrieving a specific digital information object from a multi-lingual huge and evolving news archives is challenging and complicated against a user query. The processing becomes more difficult to understand and analyze when low-resourced and morphologically complex languages like Urdu and Arabic scripts are included in the archive. Computing similarity against a query and among news articles in huge and evolving collections may be inaccurate and time-consuming at run time. This paper introduces a Similarity Measure based on Transliteration Words (SMTW) from the English language in the Urdu scripts for linking news articles extracted from multiple online sources during the preservation process. The SMTW link Urdu-to-English news articles using an upgraded Urdu-to-English lexicon, including transliteration words. The SMTW was exhaustively evaluated to assess the effectiveness using different size datasets and the results were compared with the Common Ratio Measure for Dual Language (CRMDL). The experimental results show that the SMTW was more effective than the CRMDL for linking Urdu-to-English news articles. The precision improved from 50% to 60%, recall improved from 67% to 82%, and the impact of common terms also improved.
引用
收藏
页数:17
相关论文
共 24 条
  • [1] Understanding the Research Challenges in Low-Resource Language and Linking Bilingual News Articles in Multilingual News Archive
    Khan, Muzammil
    Ullah, Kifayat
    Alharbi, Yasser
    Alferaidi, Ali
    Alharbi, Talal Saad
    Yadav, Kusum
    Alsharabi, Naif
    Ahmad, Aakash
    APPLIED SCIENCES-BASEL, 2023, 13 (15):
  • [2] HNTSumm: Hybrid text summarization of transliterated news articles
    Muniraj P.
    Sabarmathi K.R.
    Leelavathi R.
    Balaji B S.
    International Journal of Intelligent Networks, 2023, 4 : 53 - 61
  • [3] A content-based technique for linking dual language news articles in an archive
    Khan, Muzammil
    Rahman, Arif Ur
    Ahmad, Arshad
    Khan, Sarwar Shah
    JOURNAL OF INFORMATION SCIENCE, 2022, 48 (01) : 57 - 70
  • [4] Public News Archive: A Searchable Sub-archive to Portuguese Past News Articles
    Campos, Ricardo
    Correia, Diogo
    Jatowt, Adam
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 : 211 - 216
  • [5] Providing Web Archive News Articles as Corpus Data
    Tonnessen, Jon Carlstedt
    Birkenes, Magnus Breder
    JOURNAL OF OPEN HUMANITIES DATA, 2025, 11
  • [6] HarriGT: Linking news articles to scientific literature
    Ravenscroft, James
    Clare, Amanda
    Liakata, Maria
    56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2018, : 19 - 24
  • [7] Using Document Embeddings for Background Linking of News Articles
    Khloponin, Pavel
    Kosseim, Leila
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 317 - 329
  • [8] Extracting the opinions of news articles based on emotionally laden words
    Shinomiya, Mizuho
    Ren, Fuji
    Kuroiwa, Shingo
    Tsuchiya, Seiji
    PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 262 - +
  • [9] FrNewsLink : a corpus linking TV Broadcast News Segments and Press Articles
    Camelin, Nathalie
    Damnati, Geraldine
    Bouchekif, Abdessalam
    Landeau, Anais
    Charlet, Delphine
    Esteve, Yannick
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2087 - 2092
  • [10] Detection of Characteristic Co-Occurrence Words from News Articles on the Web
    Xiao, Feng
    Noro, Tomoya
    Tokuda, Takehiro
    INFORMATION MODELLING AND KNOWLEDGE BASES XXIII, 2012, 237 : 187 - 203