Contrastive Training Improves Zero-Shot Classification of Semi-structured Documents

被引:0
|
作者
Khalifa, Muhammad [1 ]
Vyas, Yogarshi [2 ]
Wang, Shuai [2 ]
Horwood, Graham [2 ]
Mallya, Sunil
Ballesteros, Miguel [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] AWS AI Labs, Seattle, WA 98019 USA
关键词
LABEL;
D O I
暂无
中图分类号
学科分类号
摘要
We investigate semi-structured document classification in a zero-shot setting. Classification of semi-structured documents is more challenging than that of standard unstructured documents, as positional, layout, and style information play a vital role in interpreting such documents. The standard classification setting where categories are fixed during both training and testing falls short in dynamic environments where new document categories could potentially emerge. We focus exclusively on the zero-shot setting where inference is done on new unseen classes. To address this task, we propose a matching-based approach that relies on a pairwise contrastive objective for both pretraining and fine-tuning. Our results show a significant boost in Macro F1 from the proposed pretraining step in both supervised and unsupervised zero-shot settings.
引用
收藏
页码:7499 / 7508
页数:10
相关论文
共 50 条
  • [41] Characteristic sets of strings common to semi-structured documents
    Ikeda, D
    DISCOVERY SCIENCE, PROCEEDINGS, 1999, 1721 : 139 - 147
  • [42] Information extraction from semi-structured web documents
    Yun, Bo-Hyun
    Seo, Chang-Ho
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2006, 4092 : 586 - 598
  • [43] Filtering Semi-Structured Documents Based on Faceted Feedback
    Zhang, Lanbo
    Zhang, Yi
    Xing, Qianli
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 645 - 654
  • [44] Zero-shot object rumor detection based on contrastive learning
    Chen, Ke
    Zhang, Wenhao
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (09): : 1790 - 1800
  • [45] A semantic network approach to semi-structured documents repositories
    Christophides, V
    Dorr, M
    Fundulaki, I
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 1997, 1324 : 305 - 324
  • [46] Generating Visual Representations for Zero-Shot Classification
    Bucher, Maxime
    Herbin, Stephane
    Jurie, Frederic
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2666 - 2673
  • [47] Zero-Shot Taxonomy Mapping for Document Classification
    Bongiovanni, Lorenzo
    Bruno, Luca
    Dominici, Fabrizio
    Rizzo, Giuseppe
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 911 - 918
  • [48] ATTRIBUTE DRIVEN ZERO-SHOT CLASSIFICATION AND SEGMENTATION
    Yang, Shu
    Shi, Yemin
    Wang, Yaowei
    Wang, Jing
    Fei, Zesong
    2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW 2018), 2018,
  • [49] Attribute relation learning for zero-shot classification
    Liu, Mingxia
    Zhang, Daoqiang
    Chen, Songcan
    NEUROCOMPUTING, 2014, 139 : 34 - 46
  • [50] Zero-shot Classification using Hyperdimensional Computing
    Ruffino, Samuele
    Karunaratne, Geethan
    Hersche, Michael
    Benini, Luca
    Abu Sebastian
    Rahimi, Abbas
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,