Effects of diacritics on Turkish information retrieval

被引:7
|
作者
Alpkocak, Adil [1 ]
Ceylan, Meltem [1 ]
机构
[1] Dokuz Eylul Univ, Dept Comp Engn, TR-35160 Izmir, Turkey
关键词
Turkish information retrieval; diacritics; document expansion; query expansion;
D O I
10.3906/elk-1010-819
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the effects of improper use of diacritics in the Turkish alphabet on information retrieval. A diacritic is simply a supplementary sign added to a letter to change the sound value of the letter, and the Turkish alphabet has 5 special letters derived from Latin by adding different diacritics. The statistical analysis performed in this study shows that retrieval performance significantly decreases when documents and queries contain letters with different forms, such that documents consist of letters with diacritics while queries consist of standard Latin letters and vice versa. In order to tackle this challenge, we propose 3 approaches: token normalization by equivalence classes, document expansion, and query expansion. The experimental evaluations carried on the Bilkent Turkish information retrieval test collection suggests that the proposed approaches are promising as a remedy in this line of research.
引用
收藏
页码:787 / 804
页数:18
相关论文
共 50 条
  • [21] EFFECTS OF ECS AND HYPOXIA ON INFORMATION-RETRIEVAL
    DANDREA, JA
    KESNER, RP
    PHYSIOLOGY & BEHAVIOR, 1973, 11 (06) : 747 - 752
  • [22] Pitch-frequency histogram-based music information retrieval for Turkish music
    Gedik, Ali C.
    Bozkurt, Baris
    SIGNAL PROCESSING, 2010, 90 (04) : 1049 - 1063
  • [23] Interfering Effects of Retrieval in Learning New Information
    Finn, Bridgid
    Roediger, Henry L., III
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-LEARNING MEMORY AND COGNITION, 2013, 39 (06) : 1665 - 1681
  • [24] Stylistic Document Retrieval for Turkish
    Zamalieva, Daniya
    Kalaycilar, Firat
    Kale, Asli
    Pehlivan, Selen
    Can, Fazli
    2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 661 - 665
  • [25] Evaluation of the Makam Scale Theory of Arel for Music Information Retrieval on Traditional Turkish Art Music
    Gedik, Ali C.
    Bozkurt, Baris
    JOURNAL OF NEW MUSIC RESEARCH, 2009, 38 (02) : 103 - 116
  • [26] Boosting the Capacity of Diacritics-Based Methods for Information Hiding in Arabic Text
    Mohammed Lahcen Bensaad
    Mohammed Bachir Yagoubi
    Arabian Journal for Science and Engineering, 2013, 38 : 2035 - 2041
  • [27] Questionnaire mode effects in interactive information retrieval experiments
    Kelly, Diane
    Harper, David J.
    Landau, Brian
    INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (01) : 122 - 141
  • [28] Turkish Broadcast News Transcription and Retrieval
    Arisoy, Ebru
    Can, Dogan
    Parlak, Siddika
    Sak, Hasim
    Saraclar, Murat
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 874 - 883
  • [29] Effectiveness of stemming for Turkish text retrieval
    Ekmekçioglu, FÇ
    Willett, P
    PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2000, 34 (02) : 195 - 200
  • [30] INFORMATION RETRIEVAL
    GARFIELD, E
    SCIENCE, 1967, 156 (3780) : 1398 - &