2LSPE: 2D Learnable Sinusoidal Positional Encoding using Transformer for Scene Text Recognition

被引:9
|
作者
Raisi, Zobeir [1 ]
Naiel, Mohamed A. [1 ]
Younes, Georges [1 ]
Wardell, Steven [2 ]
Zelek, John [1 ]
机构
[1] Univ Waterloo, Waterloo, ON N2L 3G1, Canada
[2] ATS Automat Tooling Syst Inc, Cambridge, ON, Canada
关键词
Transformer; 2D Learnable Sinusoidal Positional Encoding; Irregular Text; Scene Text Recognition; NETWORK;
D O I
10.1109/CRV52889.2021.00024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Positional Encoding (PE) plays a vital role in a Transformer's ability to capture the order of sequential information, allowing it to overcome the permutation equivarience property. Recent state-of-the-art Transformer-based scene text recognition methods have leveraged the advantages of the 2D form of PE with fixed sinusoidal frequencies, also known as 2SPE, to better encode the 2D spatial dependencies of characters in a scene text image. These 2SPE-based Transformer frameworks have outperformed Recurrent Neural Networks (RNNs) based methods, mostly on recognizing text of arbitrary shapes; However, they are not tailored to the type of data and classification task at hand. In this paper, we extend a recent Learnable Sinusoidal frequencies PE (LSPE) from 1D to 2D, which we hereafter refer to as 2LSPE, and study how to adaptively choose the sinusoidal frequencies from the input training data. Moreover, we show how to apply the proposed Transformer architecture for scene text recognition. We compare our method against 11 state-of-the-art methods and show that it outperforms them in over 50% of the standard tests and are no worse than the second best performer, whereas we outperform all other methods on irregular text datasets (i.e., non horizontal or vertical layouts). Experimental results demonstrate that the proposed method offers higher word recognition accuracy (WRA) than two recent Transformer-based methods, and eleven state-of-the-art RNN-based techniques on four challenging irregular-text recognition datasets, all while maintaining the highest WRA values on the regular-text datasets.
引用
收藏
页码:119 / 126
页数:8
相关论文
共 50 条
  • [41] Recognition of facial expressions using 2D DCT and neural network
    Xiao, YG
    Chandrasiri, NP
    Tadokoro, Y
    Oda, M
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1999, 82 (07): : 1 - 11
  • [42] Gait Recognition with 2D Pose Infomation Using a Surveillance Camera
    Inoue, Tomohiro
    Chikano, Megumi
    Awai, Shuji
    Konno, Takeshi
    2021 IEEE 3RD GLOBAL CONFERENCE ON LIFE SCIENCES AND TECHNOLOGIES (IEEE LIFETECH 2021), 2021, : 351 - 355
  • [43] 2D HUMAN-EAR RECOGNITION USING GEOMETRIC FEATURES
    Polin, Md. Zahid Hasan
    Kabir, A. N. M. Enamul
    Sadi, Muhammad Sheikh
    2012 7TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2012,
  • [44] Palmprint recognition using 2D Gabor filter and hamming classifier
    Xi, Wang
    Lu, Jiwen
    ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 2508 - 2511
  • [45] Automatic recognition of image details using stereovision and 2D algorithms
    Balcerek, Julian
    Luczak, Mateusz
    Pawlowski, Pawel
    Dabrowski, Adam
    2018 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2018, : 268 - 273
  • [46] Recognition of 2D standalone and occluded objects using wavelet transform
    Tsang, KM
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2001, 15 (04) : 691 - 705
  • [47] Invariant 2D object recognition using the wavelet modulus maxima
    Khalil, MI
    Bayoumi, MM
    PATTERN RECOGNITION LETTERS, 2000, 21 (09) : 863 - 872
  • [48] Noisy Phoneme Recognition Using 2D Convolution Neural Network
    Ramonaite, Justina
    Korvel, Grazina
    2023 IEEE 10TH JUBILEE WORKSHOP ON ADVANCES IN INFORMATION, ELECTRONIC AND ELECTRICAL ENGINEERING, AIEEE, 2023,
  • [49] 2D partially occluded object recognition using curve moments
    Lim, KB
    Du, TH
    Zheng, H
    PROCEEDINGS OF THE SEVENTH IASTED INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS AND IMAGING, 2004, : 303 - +
  • [50] Novel scene generation, merging and stitching views using the 2D affine space
    Sengupta, K
    Ohya, J
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS '97, PROCEEDINGS, 1997, : 602 - 603