Writer Retrieval using Compact Convolutional Transformers and NetMVLAD

被引:2
|
作者
Peer, Marco [1 ]
Kleber, Florian [1 ]
Sablatnig, Robert [1 ]
机构
[1] TU Wien, Inst Visual Comp & Human Ctr Technol, Comp Vis Lab, Vienna, Austria
关键词
IDENTIFICATION; FEATURES;
D O I
10.1109/ICPR56361.2022.9956155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a method for writer retrieval where embeddings of patches extracted at SIFT keypoint locations are learned by a Compact Convolutional Transformer (CCT), a modified attention-based transformer architecture including convolutions, followed by a NetMVLAD layer and Generalized Max Pooling (GMP) to obtain global page descriptors. We introduce the application of CCTs for writer retrieval and show that they outperform Convolutional Neural Networks (CNNs) used in current State-of-the-Art methods for writer retrieval, namely ResNet18, while at the same time only have one-third of the number of parameters. Additionally, we propose NetMVLAD, an extension of NetVLAD with multiple vocabularies, to encode information with different vocabulary sizes improving the original NetVLAD. An evaluation of the performance of CCTs compared to ResNet18 is provided on the ICDAR2013 Competition on Writer Identification dataset (ICDAR2013) and CVL dataset. The effect of multiple vocabularies applied within the NetVLAD layer is shown. CCT7 pretrained on CIFAR100 combined with NetMVLAD achieves 89.3% Mean Average Precision (mAP) on the ICDAR2013 dataset and 96.5% on the CVL dataset.
引用
收藏
页码:1571 / 1578
页数:8
相关论文
共 50 条
  • [1] Writer Identification and Retrieval Using a Convolutional Neural Network
    Fiel, Stefan
    Sablatnig, Robert
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2015, PT II, 2015, 9257 : 26 - 37
  • [2] Self-supervised Vision Transformers for Writer Retrieval
    Raven, Tim
    Matei, Arthur
    Fink, Gernot A.
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 380 - 396
  • [3] A music recommender system based on compact convolutional transformers
    Pourmoazemi, Negar
    Maleki, Sepehr
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [4] Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval
    Peer, Marco
    Kleber, Florian
    Sablatnig, Robert
    FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 122 - 136
  • [5] Writer Identification and Writer Retrieval Using Vision Transformer for Forensic Documents
    Koepf, Michael
    Kleber, Florian
    Sablatnig, Robert
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 352 - 366
  • [6] Writer Identification and Writer Retrieval using the Fisher Vector on Visual Vocabularies
    Fiel, Stefan
    Sablatnig, Robert
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 545 - 549
  • [7] Deepfake detection using convolutional vision transformers and convolutional neural networks
    Soudy, Ahmed Hatem
    Sayed, Omnia
    Tag-Elser, Hala
    Ragab, Rewaa
    Mohsen, Sohaila
    Mostafa, Tarek
    Abohany, Amr A.
    Slim, Salwa O.
    Neural Computing and Applications, 2024, 36 (31) : 19759 - 19775
  • [8] Compact descriptors for sketch-based image retrieval using a triplet loss convolutional neural network
    Bui, T.
    Ribeiro, L.
    Ponti, M.
    Collomosse, J.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 164 : 27 - 37
  • [9] Content-based image retrieval with compact deep convolutional features
    Alzu'bi, Ahmad
    Amira, Abbes
    Ramzan, Naeem
    NEUROCOMPUTING, 2017, 249 : 95 - 105
  • [10] CCTCOVID: COVID-19 detection from chest X-ray images using Compact Convolutional Transformers
    Marefat, Abdolreza
    Marefat, Mahdieh
    Hassannataj Joloudari, Javad
    Nematollahi, Mohammad Ali
    Lashgari, Reza
    FRONTIERS IN PUBLIC HEALTH, 2023, 11