End-to-end semi-supervised approach with modulated object queries for table detection in documents

被引:0
|
作者
Ehsan, Iqraa [1 ]
Shehzadi, Tahira [1 ,2 ,3 ]
Stricker, Didier [1 ,2 ,3 ]
Afzal, Muhammad Zeshan [1 ,2 ,3 ]
机构
[1] Tech Univ Kaiserslautern, Dept Comp Sci, D-67663 Kaiserslautern, Germany
[2] Tech Univ Kaiserslautern, Mindgarage, D-67663 Kaiserslautern, Germany
[3] German Res Inst Artificial Intelligence DFKI, Comp Vis, D-67663 Kaiserslautern, Germany
关键词
Table detection; Document analysis; Semi-supervised learning; Detection transformer;
D O I
10.1007/s10032-024-00471-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Table detection, a pivotal task in document analysis, aims to precisely recognize and locate tables within document images. Although deep learning has shown remarkable progress in this realm, it typically requires an extensive dataset of labeled data for proficient training. Current CNN-based semi-supervised table detection approaches use the anchor generation process and non-maximum suppression in their detection process, limiting training efficiency. Meanwhile, transformer-based semi-supervised techniques adopted a one-to-one match strategy that provides noisy pseudo-labels, limiting overall efficiency. This study presents an innovative transformer-based semi-supervised table detector. It improves the quality of pseudo-labels through a novel matching strategy combining one-to-one and one-to-many assignment techniques. This approach significantly enhances training efficiency during the early stages, ensuring superior pseudo-labels for further training. Our semi-supervised approach is comprehensively evaluated on benchmark datasets, including PubLayNet, ICADR-19, and TableBank. It achieves new state-of-the-art results, with a mAP of 95.7% and 97.9% on TableBank (word) and PubLaynet with 30% label data, marking a 7.4 and 7.6 point improvement over previous semi-supervised table detection approach, respectively. The results clearly show the superiority of our semi-supervised approach, surpassing all existing state-of-the-art methods by substantial margins. This research represents a significant advancement in semi-supervised table detection methods, offering a more efficient and accurate solution for practical document analysis tasks.
引用
收藏
页码:363 / 378
页数:16
相关论文
共 50 条
  • [21] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION VIA LOCAL PRIOR MATCHING
    Hsu, Wei-Ning
    Lee, Ann
    Synnaeve, Gabriel
    Hannun, Awni
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 125 - 132
  • [22] Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
    Takashima, Yuki
    Fujita, Yusuke
    Horiguchi, Shota
    Watanabe, Shinji
    Garcia, Paola
    Nagamatsu, Kenji
    INTERSPEECH 2021, 2021, : 3096 - 3100
  • [23] SEMI-SUPERVISED TRAINING FOR IMPROVING DATA EFFICIENCY IN END-TO-END SPEECH SYNTHESIS
    Chung, Yu-An
    Wang, Yuxuan
    Hsu, Wei-Ning
    Zhang, Yu
    Skerry-Ryan, R. J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6940 - 6944
  • [24] Saliency Guided End-to-End Learning for Weakly Supervised Object Detection
    Lai, Baisheng
    Gong, Xiaojin
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2053 - 2059
  • [25] Towards Precise End-to-end Weakly Supervised Object Detection Network
    Yang, Ke
    Li, Dongsheng
    Dou, Yong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8371 - 8380
  • [26] End-to-End Weakly Supervised Object Detection with Sparse Proposal Evolution
    Liao, Mingxiang
    Wan, Fang
    Yao, Yuan
    Han, Zhenjun
    Zou, Jialing
    Wang, Yuze
    Feng, Bailan
    Yuan, Peng
    Ye, Qixiang
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 210 - 226
  • [27] End-to-End Semi-Supervised Ordinal Regression AUC Maximization with Convolutional Kernel Networks
    Xiong, Ziran
    Shi, Wanli
    Gu, Bin
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 2140 - 2150
  • [28] End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
    Wu, Pengfei
    Ling, Zhenhua
    Liu, Lijuan
    Jiang, Yuan
    Wu, Hongchuan
    Dai, Lirong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 623 - 627
  • [29] SEMI-SUPERVISED LEARNING BASED ON HIERARCHICAL GENERATIVE MODELS FOR END-TO-END SPEECH SYNTHESIS
    Fujimoto, Takato
    Takaki, Shinji
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7644 - 7648
  • [30] Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
    Dey, Subhadeep
    Motlicek, Petr
    Bui, Trung
    Dernoncourt, Franck
    INTERSPEECH 2019, 2019, : 734 - 738