Low-Density Language Bootstrapping: The Case of Tajiki Persian

被引:0
|
作者
Megerdoomian, Karine [1 ]
Parvaz, Dan [1 ]
机构
[1] Mitre Corp, Mclean, VA 22102 USA
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Low-density languages raise difficulties for standard approaches to natural language processing that depend on large online corpora. Using Persian as a case study, we propose a novel method for bootstrapping MT capability for a low-density language in the case where it relates to a higher density variant. Tajiki Persian is a low-density language that uses the Cyrillic alphabet, while Iranian Persian (Farsi) is written in an extended version of the Arabic script and has many computational resources available. Despite the orthographic differences, the two languages have literary written forms that are almost identical. The paper describes the development of a comprehensive finite-state transducer that converts Tajik text to Farsi script and runs the resulting transliterated document through an existing Persian-to-English MT system. Due to divergences that arise in mapping the two writing systems and phonological and lexical distinctions, the system uses contextual cues (such as the position of a phoneme in a word) as well as available Farsi resources (such as a morphological analyzer to deal with differences in the affixal structures and a lexicon to disambiguate the analyses) to control the potential combinatorial explosion. The results point to a valuable strategy for the rapid prototyping of MT packages for languages of similar uneven density.
引用
收藏
页码:3293 / 3298
页数:6
相关论文
共 50 条
  • [1] Offensive language detection in low resource languages: A use case of Persian language
    Mozafari, Marzieh
    Mnassri, Khouloud
    Farahbakhsh, Reza
    Crespi, Noel
    PLOS ONE, 2024, 19 (06):
  • [2] BLENDS OF LINEAR LOW-DENSITY AND LOW-DENSITY POLYETHYLENE
    HAGHIGHAT, S
    BIRLEY, AW
    PLASTICS AND RUBBER PROCESSING AND APPLICATIONS, 1990, 13 (03): : 197 - 200
  • [3] CASE-STUDIES IN LOW-DENSITY AREAS
    BAKER, DD
    PARSONS, MH
    RICHARD, AA
    MARTIN, H
    ELECTRICAL COMMUNICATION, 1995, (01): : 47 - 52
  • [4] THE LOW-DENSITY LIMIT IN FINITE TEMPERATURE CASE
    ACCARDI, L
    LU, YG
    NAGOYA MATHEMATICAL JOURNAL, 1992, 126 : 25 - 87
  • [5] ELONGATIONAL BEHAVIOR OF LOW-DENSITY LINEAR LOW-DENSITY POLYETHYLENES
    LAMANTIA, FP
    VALENZA, A
    ACIERNO, D
    POLYMER ENGINEERING AND SCIENCE, 1988, 28 (02): : 90 - 95
  • [6] CHARACTERIZATION OF LOW-DENSITY LINEAR LOW-DENSITY POLYETHYLENE BLENDS
    GUNDERSON, JJ
    PARIKH, DR
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1989, 198 : 201 - POLY
  • [7] FLOW PROPERTIES OF LOW-DENSITY LINEAR LOW-DENSITY POLYETHYLENES
    ACIERNO, D
    CURTO, D
    LAMANTIA, FP
    VALENZA, A
    POLYMER ENGINEERING AND SCIENCE, 1986, 26 (01): : 28 - 33
  • [8] AVAILABILITY - LOW-DENSITY DEPLOYMENT CASE-STUDY
    FABBRO, RM
    PROCEEDINGS ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 1979, (NSYM): : 247 - 253
  • [9] Characterizing blends of linear low-density and low-density polyethylene by DSC
    Cran, MJ
    Bigger, SW
    Scheirs, J
    JOURNAL OF THERMAL ANALYSIS AND CALORIMETRY, 2005, 81 (02) : 321 - 327
  • [10] VERY LOW-DENSITY LIPOPROTEIN AND LOW-DENSITY LIPOPROTEIN METABOLISM - OVERVIEW
    STEINBERG, D
    JOURNAL OF THE AMERICAN OIL CHEMISTS SOCIETY, 1979, 56 (02) : A189 - A189