Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents

被引:0
|
作者
Khalifa, Muhammad [1 ]
Vyas, Yogarshi [2 ]
Wang, Shuai [2 ]
Horwood, Graham [2 ]
Mallya, Sunil
Ballesteros, Miguel [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] AWS AI Labs, Seattle, WA 98019 USA
关键词
LABEL;
D O I
暂无
中图分类号
学科分类号
摘要
We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F1 from the proposed pretraining step in both supervised and unsupervised zero-shot settings.
引用
收藏
页码:7499 / 7508
页数:10
相关论文
共 50 条
  • [1] Advancing the terminological classification of semi-structured documents
    Stratogiannis, Georgios
    Siolas, Georgios
    Stamou, Georgios
    Stafylopatis, Andreas
    Chortaras, Alexandros
    Tagaris, Athanasios
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 333 - 339
  • [2] CLZT: A Contrastive Learning Based Framework for Zero-Shot Text Classification
    Li, Kun
    Lin, Meng
    Hu, Songlin
    Li, Ruixuan
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT II, 2022, : 623 - 630
  • [3] Adding Structure to Semi-Structured Documents
    Moens, Marie-Francine
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS: JURIX 2009: THE TWENTY-SECOND ANNUAL CONFERENCE, 2009, 205 : IX - IX
  • [4] Automatic Generation of Semi-structured Documents
    Belhadj, Djedjiga
    Belaid, Yolande
    Belaid, Abdel
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 191 - 205
  • [5] A Semantic Kernel for semi-structured documents
    Aseervatham, Sujeevan
    Viennet, Emmanuel
    Bennani, Younes
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 403 - 408
  • [6] Semantic annotation of semi-structured documents
    Ranganathan, Girish R.
    Biletskiy, Yevgen
    Kaltchenko, Alexey
    2008 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-4, 2008, : 877 - +
  • [7] Contrastive Embedding for Generalized Zero-Shot Learning
    Han, Zongyan
    Fu, Zhenyong
    Chen, Shuo
    Yang, Jian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2371 - 2381
  • [8] Deep Learning Multimodal for Unstructured and Semi-Structured Textual Documents Classification
    Katamesh, Nany
    Abu-Elnasr, Osama
    Elmougy, Samir
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (01): : 589 - 606
  • [9] Toward any-language zero-shot topic classification of textual documents
    Song, Yangqiu
    Upadhyay, Shyam
    Peng, Haoruo
    Mayhew, Stephen
    Roth, Dan
    ARTIFICIAL INTELLIGENCE, 2019, 274 : 133 - 150
  • [10] Semi-Supervised Zero-Shot Classification with Label Representation Learning
    Li, Xin
    Guo, Yuhong
    Schuurmans, Dale
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4211 - 4219