Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern

被引:0
|
作者
Sifat, Md Habibur Rahman [1 ]
Rahman, Chowdhury Rafeed [1 ]
Rafsan, Mohammad [1 ]
Rahman, Hasibur [1 ]
机构
[1] United Int Univ, Dhaka, Bangladesh
关键词
Bengali error dataset; Phonetically similar; Constant cluster; Spell checker;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While writing Bengali using English keyboard, users often make spelling mistakes. The accuracy of any Bengali spell checker or paragraph correction module largely depends on the kind of error dataset it is based on. Manual generation of such error dataset is a cumbersome process. In this research, We present an algorithm for automatic misspelled Bengali word generation from correct word through analyzing Bengali writing pattern using QWERTY layout English keyboard. As part of our analysis, we have formed a list of most commonly used Bengali words, phonetically similar replaceable clusters, frequently mispressed replaceable clusters, frequently mispressed insertion prone clusters and some rules for Juktakkhar (constant letter clusters) handling while generating errors.
引用
收藏
页码:1363 / 1366
页数:4
相关论文
共 50 条
  • [1] Synthetic Dataset Generation of Driver Telematics
    So, Banghee
    Boucher, Jean-Philippe
    Valdez, Emiliano A.
    RISKS, 2021, 9 (04)
  • [2] Synthetic dataset generation system for vehicle detection
    Oric, Mihaela
    Galic, Vlatko
    Novoselnik, Filip
    SOFTWARE IMPACTS, 2025, 23
  • [3] Synthetic Dataset Generation for Fairer Unfairness Research
    Jiang, Lan
    Belitz, Clara
    Bosch, Nigel
    FOURTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2024, 2024, : 200 - 209
  • [4] Synthetic Dataset Generation Method for Object Detection
    Ningning Zhou
    Tong Li
    International Journal of Computational Intelligence Systems, 18 (1)
  • [5] A Comparative Study of Synthetic Dataset Generation Techniques
    Dandekar, Ashish
    Zen, Remmy A. M.
    Bressan, Stephane
    DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 387 - 395
  • [6] MedWGAN based synthetic dataset generation for Uveitis pathology
    Sliman, Heithem
    Megdiche, Imen
    Alajramy, Loay
    Taweel, Adel
    Yangui, Sami
    Drira, Aida
    Lamine, Elyes
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 18
  • [7] Synthetic Ground Truth Generation of an Electricity Consumption Dataset
    Mascali, Lorenzo
    Eiraudo, Simone
    Barbierato, Luca
    Schiera, Daniele Salvatore
    Giannantonio, Roberta
    Patti, Edoardo
    Bottaccioli, Lorenzo
    Lanzini, Andrea
    2022 INTERNATIONAL CONFERENCE ON SMART ENERGY SYSTEMS AND TECHNOLOGIES, SEST, 2022,
  • [8] Synthetic time series dataset generation for unsupervised autoencoders
    Klopries, Hendrik
    Torres, David Orlando Salazar
    Schwung, Andreas
    2022 IEEE 27TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2022,
  • [9] Synthetic Dataset Generation for an Electricity Market Simulation Game
    Phyo, Pyae P.
    Kok, Koen
    Paterakis, Nikolaos G.
    2024 INTERNATIONAL CONFERENCE ON SMART ENERGY SYSTEMS AND TECHNOLOGIES, SEST 2024, 2024,
  • [10] Prediction of Learner Native Language by Writing Error Pattern
    Flanagan, Brendan
    Yin, Chengjiu
    Suzuki, Takahiko
    Hirokawa, Sachio
    LEARNING AND COLLABORATION TECHNOLOGIES, LCT 2015, 2015, 9192 : 87 - 96