A Hybrid Probabilistic Approach for Table Understanding

被引:0
|
作者
Sun, Kexuan [1 ]
Rayudu, Harsha [1 ]
Pujara, Jay [1 ]
机构
[1] Univ Southern Calif, Informat Sci Inst, Los Angeles, CA 90089 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tables of data are used to record vast amounts of socioeconomic, scientific, and governmental information. Although humans create tables using underlying organizational principles, unfortunately AI systems struggle to understand the contents of these tables. This paper introduces an end-to-end system for table understanding, the process of capturing the relational structure of data in tables. We introduce models that identify cell types, group these cells into blocks of data that serve a similar functional role, and predict the relationships between these blocks. We introduce a hybrid, neuro-symbolic approach, combining embedded representations learned from thousands of tables with probabilistic constraints that capture regularities in how humans organize tables. Our neurosymbolic model is better able to capture positional invariants of headers and enforce homogeneity of data types. One limitation in this research area is the lack of rich datasets for evaluating end-to-end table understanding, so we introduce a new benchmark dataset comprised of 431 diverse tables from data.gov. The evaluation results show that our system achieves the state-of-the-art performance on cell type classification, block identification, and relationship prediction, improving over prior efforts by up to 7% of macro F1 score.
引用
收藏
页码:4366 / 4374
页数:9
相关论文
共 50 条
  • [41] Medium-term probabilistic forecasting of electricity prices: a hybrid approach
    Bello, Antonio
    Bunn, Derek
    Reneses, Javier
    Munoz, Antonio
    2017 IEEE MANCHESTER POWERTECH, 2017,
  • [42] Medium-Term Probabilistic Forecasting of Electricity Prices: A Hybrid Approach
    Bello A.
    Bunn D.W.
    Reneses J.
    Munoz A.
    IEEE Transactions on Power Systems, 2017, 32 (01): : 334 - 343
  • [43] A probabilistic hybrid sensor fusion and optimization approach for aircraft composite components
    Costiner, Sorin
    Winston, Howard A.
    Gurvich, Mark R.
    Ghoshal, Anindya
    Welsh, Gregory S.
    Butler, Shaoluo L.
    Urban, Michael R.
    Bordick, Nathaniel
    JOURNAL OF INTELLIGENT MATERIAL SYSTEMS AND STRUCTURES, 2013, 24 (17) : 2110 - 2134
  • [44] Table understanding: Problem overview
    Shigarov, Alexey
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (01)
  • [45] Table understanding in structured documents
    Holecek, Martin
    Hoskovec, Antonin
    Baudis, Petr
    Klinger, Pavel
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 158 - 164
  • [46] Elevating Driver Behavior Understanding With RKnD: A Novel Probabilistic Feature Engineering Approach
    Islam, Mohammad Shariful
    Rony, Mohammad Abu Tareq
    Safran, Mejdl
    Alfarhood, Sultan
    Che, Dunren
    IEEE ACCESS, 2024, 12 : 65780 - 65798
  • [47] Table Content Understanding in smartFIX
    Deckert, Florian
    Seidler, Benjamin
    Ebbecke, Markus
    Gillmann, Michael
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 488 - 492
  • [48] Hybrid Cuckoo Search Approach for Course Time-Table Generation Problem
    Mallick, Subhasis
    Majumdar, Dipankar
    Mukherjee, Soumen
    Bhattacharjee, Arup Kumar
    INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING, 2020, 11 (04) : 214 - 230
  • [49] The probabilistic life table and its applications to Canada
    Li, Nan
    CANADIAN STUDIES IN POPULATION, 2015, 42 (1-2) : 117 - 129
  • [50] PROBABILISTIC APPROACHES TO CURRENT LIFE TABLE ESTIMATION
    GOLBECK, AL
    AMERICAN STATISTICIAN, 1986, 40 (03): : 185 - 190