Writer Retrieval using Compact Convolutional Transformers and NetMVLAD

被引:2
|
作者
Peer, Marco [1 ]
Kleber, Florian [1 ]
Sablatnig, Robert [1 ]
机构
[1] TU Wien, Inst Visual Comp & Human Ctr Technol, Comp Vis Lab, Vienna, Austria
来源
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2022年
关键词
IDENTIFICATION; FEATURES;
D O I
10.1109/ICPR56361.2022.9956155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method for writer retrieval where embeddings of patches extracted at SIFT keypoint locations are learned by a Compact Convolutional Transformer (CCT), a modified attention-based transformer architecture including convolutions, followed by a NetMVLAD layer and Generalized Max Pooling (GMP) to obtain global page descriptors. We introduce the application of CCTs for writer retrieval and show that they outperform Convolutional Neural Networks (CNNs) used in current State-of-the-Art methods for writer retrieval, namely ResNet18, while at the same time only have one-third of the number of parameters. Additionally, we propose NetMVLAD, an extension of NetVLAD with multiple vocabularies, to encode information with different vocabulary sizes improving the original NetVLAD. An evaluation of the performance of CCTs compared to ResNet18 is provided on the ICDAR2013 Competition on Writer Identification dataset (ICDAR2013) and CVL dataset. The effect of multiple vocabularies applied within the NetVLAD layer is shown. CCT7 pretrained on CIFAR100 combined with NetMVLAD achieves 89.3% Mean Average Precision (mAP) on the ICDAR2013 dataset and 96.5% on the CVL dataset.
引用
收藏
页码:1571 / 1578
页数:8
相关论文
共 50 条
  • [41] Production of computer-generated holograms on recordable compact disk media using a compact disk writer
    Cable, A
    Mash, P
    Wilkinson, T
    OPTICAL ENGINEERING, 2003, 42 (09) : 2514 - 2520
  • [42] Tomato maturity recognition with convolutional transformers
    Khan, Asim
    Hassan, Taimur
    Shafay, Muhammad
    Fahmy, Israa
    Werghi, Naoufel
    Mudigansalage, Seneviratne
    Hussain, Irfan
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [43] Information retrieval based writer identification
    Bensefia, A
    Paquet, T
    Heutte, L
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 946 - 950
  • [44] Tomato maturity recognition with convolutional transformers
    Asim Khan
    Taimur Hassan
    Muhammad Shafay
    Israa Fahmy
    Naoufel Werghi
    Seneviratne Mudigansalage
    Irfan Hussain
    Scientific Reports, 13
  • [45] Writer identification and writer retrieval based on NetVLAD with Re-ranking
    Rasoulzadeh, Shervin
    BabaAli, Bagher
    IET BIOMETRICS, 2022, 11 (01) : 10 - 22
  • [46] Boosting vision transformers for image retrieval
    Song, Chull Hwan
    Yoon, Jooyoung
    Choi, Shunghyun
    Avrithis, Yannis
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 107 - 117
  • [47] Content Based Video Retrieval Using Convolutional Neural Network
    Iqbal, Saeed
    Qureshi, Adnan N.
    Lodhi, Awais M.
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 170 - 186
  • [48] Thai Recipe Retrieval Application Using Convolutional Neural Network
    Phophan, Thitiwut
    Khuthanon, Rungwaraporn
    Chantamit-O-Pas, Pattanapong
    COOPERATIVE DESIGN, VISUALIZATION, AND ENGINEERING, CDVE 2022, 2022, 13492 : 135 - 146
  • [49] Medical image retrieval using deep convolutional neural network
    Qayyum, Adnan
    Anwar, Syed Muhammad
    Awais, Muhammad
    Majid, Muhammad
    NEUROCOMPUTING, 2017, 266 : 8 - 20
  • [50] Automatic Detection and Classification of Cardiovascular Disorders Using Phonocardiogram and Convolutional Vision Transformers
    Abbas, Qaisar
    Hussain, Ayyaz
    Baig, Abdul Rauf
    DIAGNOSTICS, 2022, 12 (12)