Synthetic Error Dataset Generation Mimicking Bengali Writing Pattern

被引:0
|
作者
Sifat, Md Habibur Rahman [1 ]
Rahman, Chowdhury Rafeed [1 ]
Rafsan, Mohammad [1 ]
Rahman, Hasibur [1 ]
机构
[1] United Int Univ, Dhaka, Bangladesh
关键词
Bengali error dataset; Phonetically similar; Constant cluster; Spell checker;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While writing Bengali using English keyboard, users often make spelling mistakes. The accuracy of any Bengali spell checker or paragraph correction module largely depends on the kind of error dataset it is based on. Manual generation of such error dataset is a cumbersome process. In this research, We present an algorithm for automatic misspelled Bengali word generation from correct word through analyzing Bengali writing pattern using QWERTY layout English keyboard. As part of our analysis, we have formed a list of most commonly used Bengali words, phonetically similar replaceable clusters, frequently mispressed replaceable clusters, frequently mispressed insertion prone clusters and some rules for Juktakkhar (constant letter clusters) handling while generating errors.
引用
收藏
页码:1363 / 1366
页数:4
相关论文
共 50 条
  • [31] Bias reduction via cooperative bargaining in synthetic graph dataset generation
    Wassington, Axel
    Abadal, Sergi
    APPLIED INTELLIGENCE, 2025, 55 (02)
  • [32] Automatic Generation of Point Cloud Synthetic Dataset for Historical Building Representation
    Pierdicca, Roberto
    Mameli, Marco
    Malinverni, Eva Savina
    Paolanti, Marina
    Frontoni, Emanuele
    AUGMENTED REALITY, VIRTUAL REALITY, AND COMPUTER GRAPHICS, PT I, 2019, 11613 : 203 - 219
  • [33] Synthetic pattern generation for imbalanced learning in image retrieval
    Piras, Luca
    Giacinto, Giorgio
    PATTERN RECOGNITION LETTERS, 2012, 33 (16) : 2198 - 2205
  • [34] Veiling glare removal: synthetic dataset generation, metrics and neural network architecture
    Shoshin, A., V
    Shvets, E. A.
    COMPUTER OPTICS, 2021, 45 (04) : 615 - 626
  • [35] Synthetic dataset generation for object-to-model deep learning in industrial applications
    Wong, Matthew Z.
    Kunii, Kiyohito
    Baylis, Max
    Ong, Wai Hong
    Kroupa, Pavel
    Koller, Swen
    PEERJ COMPUTER SCIENCE, 2019, 2019 (10)
  • [36] Aerial and Ground Vehicles Synthetic SAR Dataset Generation for Automatic Target Recognition
    Ahmadibeni, Ali
    Borooshak, Leila
    Jones, Branddon
    Shirkhodaie, Amir
    ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY XXVII, 2020, 11393
  • [37] Generation of synthetic dataset to improve deep learning models for pavement distress assessment
    Ghosh, Rohit
    Yamany, Mohamed S.
    Smadi, Omar
    INNOVATIVE INFRASTRUCTURE SOLUTIONS, 2025, 10 (01)
  • [38] Automated Generation of Synthetic in-Car Dataset for Human Body Pose Detection
    Borges, Joao
    Oliveira, Bruno
    Torres, Helena
    Rodrigues, Nelson
    Queiros, Sandro
    Shiller, Maximilian
    Coelho, Victor
    Pallauf, Johannes
    Brito, Jose Henrique
    Mendes, Jose
    Fonseca, Jaime C.
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 550 - 557
  • [39] ITF-GAN: Synthetic time series dataset generation and manipulation by features
    Klopries, Hendrik
    Schwung, Andreas
    KNOWLEDGE-BASED SYSTEMS, 2024, 283
  • [40] Development of AI Educational Datasets Library Using Synthetic Dataset Generation Method
    Kim, Seul Ki
    Kim, Kwihoon
    Kim, Taeyoung
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 674 - 677