Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents

被引:0
|
作者
Khalifa, Muhammad [1 ]
Vyas, Yogarshi [2 ]
Wang, Shuai [2 ]
Horwood, Graham [2 ]
Mallya, Sunil
Ballesteros, Miguel [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] AWS AI Labs, Seattle, WA 98019 USA
关键词
LABEL;
D O I
暂无
中图分类号
学科分类号
摘要
We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F1 from the proposed pretraining step in both supervised and unsupervised zero-shot settings.
引用
收藏
页码:7499 / 7508
页数:10
相关论文
共 50 条
  • [21] Semantic Contrastive Embedding for Generalized Zero-Shot Learning
    Han, Zongyan
    Fu, Zhenyong
    Chen, Shuo
    Yang, Jian
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2606 - 2622
  • [22] CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification
    Sinha, Sankalp
    Khan, Muhammad Saif Ullah
    Sheikh, Talha Uddin
    Stricker, Didier
    Afzal, Muhammad Zeshan
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT IV, 2024, 14807 : 124 - 141
  • [23] Semi-structured documents mining: a review and comparison
    Madani, Amina
    Boussaid, Omar
    Zegour, Djamel Eddine
    17TH INTERNATIONAL CONFERENCE IN KNOWLEDGE BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS - KES2013, 2013, 22 : 330 - 339
  • [24] Towards the automated verification of semi-structured documents
    Weitl, Franz
    Jaksic, Mirjana
    Freitag, Burkhard
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (03) : 292 - 317
  • [25] The Benefits of Label-Description Training for Zero-Shot Text Classification
    Gao, Lingyu
    Ghosh, Debanjan
    Gimpel, Kevin
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13823 - 13844
  • [26] Zero-shot Text Classification via Reinforced Self-training
    Ye, Zhiquan
    Geng, Yuxia
    Chen, Jiaoyan
    Xu, Xiaoxiao
    Zheng, Suhang
    Wang, Feng
    Chen, Jingmin
    Zhang, Jun
    Chen, Huajun
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3014 - 3024
  • [27] Label Agnostic Pre-training for Zero-shot Text Classification
    Clarke, Christopher
    Heng, Yuzhao
    Kang, Yiping
    Flautner, Krisztian
    Tang, Lingjia
    Mars, Jason
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1009 - 1021
  • [28] Zero-Shot Turkish Text Classification
    Birim, Ahmet
    Erden, Mustafa
    Arslan, Levent M.
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [29] Latent Embeddings for Zero-shot Classification
    Xian, Yongqin
    Akata, Zeynep
    Sharma, Gaurav
    Nguyen, Quynh
    Hein, Matthias
    Schiele, Bernt
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 69 - 77
  • [30] Zero-Shot Recognition via Structured Prediction
    Zhang, Ziming
    Saligrama, Venkatesh
    COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 533 - 548