Statistical-based approach to non-segmented language processing

被引:0
|
作者
Sornlertlamvanich, Virach [1 ]
Charoenporn, Thatsanee
Tongchim, Shisanu
Kruengkrai, Canasai
Isahara, Hitoshi
机构
[1] TCL, NICT Asia Res Ctr, Pathum Thani, Thailand
[2] NICT, Kyoto 6190289, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2007年 / E90D卷 / 10期
关键词
non-segmented language; unified language processing; statistical approach; probability language identification; word extraction; search engine;
D O I
10.1093/ietisy/e90-d.10.1565
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several approaches have been studied to cope with the exceptional features of non-segmented languages. When there is no explicit information about the boundary of a word, segmenting an input text is a formidable task in language processing. Not only the contemporary word list, but also usages of the words have to be maintained to cover the use in the current texts. The accuracy and efficiency in higher processing do heavily rely on this word boundary identification task. In this paper, we introduce some statistical based approaches to tackle the problem due to the ambiguity in word segmentation. The word boundary identification problem is then defined as a part of others for performing the unified language processing in total. To exhibit the ability in conducting the unified language processing, we selectively study the tasks of language identification, word extraction, and dictionary-less search engine.
引用
收藏
页码:1565 / 1573
页数:9
相关论文
共 50 条
  • [21] Statistical-Based Abbreviation Expansion
    Zelinka, Jan
    Romportl, Jan
    Mueller, Ludek
    TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 307 - 314
  • [22] A Novel Statistical-Based Monitoring Approach for Complex Multivariate Processes
    Ge, Zhiqiang
    Xie, Lei
    Song, Zhihuan
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2009, 48 (10) : 4892 - 4898
  • [23] A statistical-based approach for fault detection and diagnosis in a photovoltaic system
    Garoudja, Elyes
    Harrou, Fouzi
    Sun, Ying
    Kara, Kamel
    Chouder, Aissa
    Silvestre, Santiago
    2017 6TH INTERNATIONAL CONFERENCE ON SYSTEMS AND CONTROL (ICSC' 17), 2017, : 75 - 80
  • [24] Statistical-Based Approach for Predicting the Mechanical Properties of Geopolymer Concretes
    Diaz-Loya, E. Ivan
    Allouche, Erez N.
    Cahoy, Dexter
    GEOPOLYMER BINDER SYSTEMS, 2013, 1566 : 119 - 143
  • [25] PEAK SIMULATION FOR NON-SEGMENTED CONTINUOUS-FLOW SYSTEMS
    KORENAGA, T
    YOSHIDA, H
    SHEN, F
    TAKAHASHI, T
    TRAC-TRENDS IN ANALYTICAL CHEMISTRY, 1989, 8 (09) : 323 - 326
  • [26] STATISTICAL-BASED IMAGE TAGGING
    Masoud, Mohamed
    Lee, Sanghoon
    Belkasim, Saeid
    2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2016), 2016, : 610 - 613
  • [27] Adapting Code-Switching Language Models with Statistical-Based Text Augmentation
    Prachaseree, Chaiyasait
    Gupta, Kshitij
    Thi Nga Ho
    Peng, Yizhou
    Tun, Kyaw Zin
    Chng, Eng Siong
    Chalapthi, G. S. S.
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 310 - 322
  • [28] Systematic Statistical-Based Approach for Product Design: Application to Disinfectant Formulations
    Omidbakhsh, Navid
    Duever, Thomas A.
    Elkamel, Ali
    Reilly, Park M.
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2010, 49 (01) : 204 - 209
  • [29] A Low-Power Ternary Content Addressable Memory (TCAM) With Segmented And Non-Segmented Matchlines
    Sultan, M.
    Siddiqui, M.
    Sonika
    Visweswaran, G. S.
    2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 2302 - 2306
  • [30] Towards classifying non-segmented heart sound records using instantaneous frequency based features
    Alqudah A.M.
    Journal of Medical Engineering and Technology, 2019, 43 (07): : 418 - 430