End-to-end semi-supervised approach with modulated object queries for table detection in documents

被引:0
|
作者
Ehsan, Iqraa [1 ]
Shehzadi, Tahira [1 ,2 ,3 ]
Stricker, Didier [1 ,2 ,3 ]
Afzal, Muhammad Zeshan [1 ,2 ,3 ]
机构
[1] Tech Univ Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany
[2] Tech Univ Kaiserslautern, Mindgarage, D-67663 Kaiserslautern, Germany
[3] German Res Inst Artificial Intelligence DFKI, Comp Vis, D-67663 Kaiserslautern, Germany
关键词
Table detection; Document analysis; Semi-supervised learning; Detection transformer;
D O I
10.1007/s10032-024-00471-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Table detection, a pivotal task in document analysis, aims to precisely recognize and locate tables within document images. Although deep learning has shown remarkable progress in this realm, it typically requires an extensive dataset of labeled data for proficient training. Current CNN-based semi-supervised table detection approaches use the anchor generation process and non-maximum suppression in their detection process, limiting training efficiency. Meanwhile, transformer-based semi-supervised techniques adopted a one-to-one match strategy that provides noisy pseudo-labels, limiting overall efficiency. This study presents an innovative transformer-based semi-supervised table detector. It improves the quality of pseudo-labels through a novel matching strategy combining one-to-one and one-to-many assignment techniques. This approach significantly enhances training efficiency during the early stages, ensuring superior pseudo-labels for further training. Our semi-supervised approach is comprehensively evaluated on benchmark datasets, including PubLayNet, ICADR-19, and TableBank. It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data, marking a 7.4 and 7.6 point improvement over previous semi-supervised table detection approach, respectively. The results clearly show the superiority of our semi-supervised approach, surpassing all existing state-of-the-art methods by substantial margins. This research represents a significant advancement in semi-supervised table detection methods, offering a more efficient and accurate solution for practical document analysis tasks.
引用
收藏
页码:363 / 378
页数:16
相关论文
共 50 条
  • [31] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170
  • [32] Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection
    Shehzadi, Tahira
    Hashmi, Khurram Azeem
    Stricker, Didier
    Afzal, Muhammad Zeshan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 5840 - 5850
  • [33] Real-Time End-to-End Vehicle and Landmark Localization Based on Semi-Supervised Learning
    Xiao, Nengfei
    Xiong, Zhongxia
    Ma, Yalong
    Wu, Xinkai
    CICTP 2023: INNOVATION-EMPOWERED TECHNOLOGY FOR SUSTAINABLE, INTELLIGENT, DECARBONIZED, AND CONNECTED TRANSPORTATION, 2023, : 268 - 278
  • [34] EMOVA: A Semi-supervised End-to-End Moving-Window Attentive Framework for Aspect Mining
    Li, Ning
    Chow, Chi-Yin
    Zhang, Jia-Dong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 811 - 823
  • [35] End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning
    Tanaka, Tomohiro
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Orihashi, Shota
    Makishima, Naoki
    INTERSPEECH 2021, 2021, : 4458 - 4462
  • [36] Tic action recognition for children tic disorder with end-to-end video semi-supervised learning
    Wang, Xiangyang
    Yang, Kun
    Ding, Qiang
    Wang, Rui
    Sun, Jinhua
    VISUAL COMPUTER, 2025,
  • [37] End-To-End Graph-Based Deep Semi-Supervised Learning with Extended Graph Laplacian
    Wang, Zihao
    Tu, Enmei
    Zhou, Meng
    Yang, Jie
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 5948 - 5953
  • [38] SEQUENCE-LEVEL CONSISTENCY TRAINING FOR SEMI-SUPERVISED END-TO-END AUTOMATIC SPEECH RECOGNITION
    Masumura, Ryo
    Ihori, Mana
    Takashima, Akihiko
    Moriya, Takafumi
    Ando, Atsushi
    Shinohara, Yusuke
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7054 - 7058
  • [39] End-to-End Object Detection with YOLOF
    Xi, Xing
    Huang, Yangyang
    Wu, Weiye
    Luo, Ronghua
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 101 - 112
  • [40] End-to-end table structure recognition and extraction in heterogeneous documents
    Kashinath, Tejas
    Jain, Twisha
    Agrawal, Yash
    Anand, Tanvi
    Singh, Sanjay
    APPLIED SOFT COMPUTING, 2022, 123