Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern

被引:0
|
作者
Sifat, Md Habibur Rahman [1 ]
Rahman, Chowdhury Rafeed [1 ]
Rafsan, Mohammad [1 ]
Rahman, Hasibur [1 ]
机构
[1] United Int Univ, Dhaka, Bangladesh
关键词
Bengali error dataset; Phonetically similar; Constant cluster; Spell checker;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While writing Bengali using English keyboard, users often make spelling mistakes. The accuracy of any Bengali spell checker or paragraph correction module largely depends on the kind of error dataset it is based on. Manual generation of such error dataset is a cumbersome process. In this research, We present an algorithm for automatic misspelled Bengali word generation from correct word through analyzing Bengali writing pattern using QWERTY layout English keyboard. As part of our analysis, we have formed a list of most commonly used Bengali words, phonetically similar replaceable clusters, frequently mispressed replaceable clusters, frequently mispressed insertion prone clusters and some rules for Juktakkhar (constant letter clusters) handling while generating errors.
引用
收藏
页码:1363 / 1366
页数:4
相关论文
共 50 条
  • [21] Algorithm of Pattern Generation for Mimicking Disabled Person's Gait
    Kondo, Hideki
    Morishima, Akitoshi
    Ogura, Yu
    Momoki, Shimpei
    Shimizu, Juri
    Lim, Hun-ok
    Takanishi, Atsuo
    2008 2ND IEEE RAS & EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL ROBOTICS AND BIOMECHATRONICS (BIOROB 2008), VOLS 1 AND 2, 2008, : 812 - +
  • [22] Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation
    Quang Nguyen
    Truong Vu
    Anh Tran
    Khoi Nguyen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation
    Quintana, Daniel S.
    ELIFE, 2020, 9
  • [24] Emulator Design and Generation of Synthetic Dataset in Multi-Ion Sensing
    Hanitra, Ivan Ny
    Demarchi, Danilo
    Carrara, Sandro
    De Micheli, Giovanni
    2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,
  • [25] Perspective-Aware Synthetic Dataset Generation for Unmanned Retail Stores
    Munasinghe, Viduranga
    Lee, Tae-Ho
    Kim, Jin-Sung
    Lee, Hyuk-Jae
    2024 11TH INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-TAIWAN, ICCE-TAIWAN 2024, 2024, : 473 - 474
  • [26] Investigation of an Integrated Synthetic Dataset Generation Workflow for Computer Vision Applications
    Rolf, Julian
    Wolf, Mario
    Gerhard, Detlef
    PRODUCT LIFECYCLE MANAGEMENT: LEVERAGING DIGITAL TWINS, CIRCULAR ECONOMY, AND KNOWLEDGE MANAGEMENT FOR SUSTAINABLE INNOVATION, PT II, PLM 2023, 2024, 702 : 187 - 196
  • [27] User-Driven Synthetic Dataset Generation With Quantifiable Differential Privacy
    Tai, Bo-Chen
    Tsou, Yao-Tung
    Li, Szu-Chuang
    Huang, Yennun
    Tsai, Pei-Yuan
    Tsai, Yu-Cheng
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (05) : 3812 - 3826
  • [28] Generation and study of the synthetic brain electron microscopy dataset for segmentation purpose
    Sokolov, N. A.
    Vasiliev, E. P.
    Getmanskaya, A. A.
    COMPUTER OPTICS, 2023, 47 (05) : 778 - 787
  • [29] Generation of a global synthetic tropical cyclone hazard dataset using STORM
    Bloemendaal, Nadia
    Haigh, Ivan D.
    de Moel, Hans
    Muis, Sanne
    Haarsma, Reindert J.
    Aerts, Jeroen C. J. H.
    SCIENTIFIC DATA, 2020, 7 (01)
  • [30] Generation of a global synthetic tropical cyclone hazard dataset using STORM
    Nadia Bloemendaal
    Ivan D. Haigh
    Hans de Moel
    Sanne Muis
    Reindert J. Haarsma
    Jeroen C. J. H. Aerts
    Scientific Data, 7