Effects of diacritics on Turkish information retrieval

被引:7
|
作者
Alpkocak, Adil [1 ]
Ceylan, Meltem [1 ]
机构
[1] Dokuz Eylul Univ, Dept Comp Engn, TR-35160 Izmir, Turkey
关键词
Turkish information retrieval; diacritics; document expansion; query expansion;
D O I
10.3906/elk-1010-819
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the effects of improper use of diacritics in the Turkish alphabet on information retrieval. A diacritic is simply a supplementary sign added to a letter to change the sound value of the letter, and the Turkish alphabet has 5 special letters derived from Latin by adding different diacritics. The statistical analysis performed in this study shows that retrieval performance significantly decreases when documents and queries contain letters with different forms, such that documents consist of letters with diacritics while queries consist of standard Latin letters and vice versa. In order to tackle this challenge, we propose 3 approaches: token normalization by equivalence classes, document expansion, and query expansion. The experimental evaluations carried on the Bilkent Turkish information retrieval test collection suggests that the proposed approaches are promising as a remedy in this line of research.
引用
收藏
页码:787 / 804
页数:18
相关论文
共 50 条
  • [1] DeASCIIfication approach to handle diacritics in Turkish information retrieval
    Arslan, Ahmet
    INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (02) : 326 - 339
  • [2] Information Retrieval of Text with Diacritics
    Aloufi, Khalid Saleh Rabeh
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (08): : 118 - 122
  • [3] Information retrieval on Turkish texts
    Can, Fazli
    Kocberber, Seyit
    Balcik, Erman
    Kaynak, Cihan
    Ocalan, H. Cagdas
    Vursavas, Onur M.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2008, 59 (03): : 407 - 421
  • [4] A linguistically motivated information retrieval system for Turkish
    Pembe, FC
    Say, ACC
    COMPUTER AND INFORMATION SCIENCES - ISCIS 2004, PROCEEDINGS, 2004, 3280 : 741 - 750
  • [5] Spoken Information Retrieval for Turkish Broadcast News
    Parlak, Siddika
    Saraclar, Murat
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 782 - 783
  • [6] Turkish information retrieval: Past changes future
    Can, Fazli
    ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2006, 4243 : 13 - 22
  • [7] Information retrieval effectiveness of Turkish search engines
    Bitirim, Y
    Tonta, Y
    Sever, H
    ADVANCES IN INFORMATION SYSTEMS, 2002, 2457 : 93 - 103
  • [8] A COMPARISON OF RELATIONAL DATABASES AND INFORMATION RETRIEVAL LIBRARIES ON TURKISH TEXT RETRIEVAL
    Arslan, Ahmet
    Yilmazel, Ozgur
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 296 - 303
  • [9] How are letters containing diacritics represented?: Repetition blindness for Turkish words
    Ayçiçegi, A
    Harris, CL
    EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 2002, 14 (03): : 371 - 382
  • [10] Multilingual information retrieval on the Internet: A case study of Turkish users
    Aytac, S
    INTERNATIONAL INFORMATION & LIBRARY REVIEW, 2005, 37 (04) : 275 - 284