A Versatile Hypergraph Model for Document Collections

被引:1
|
作者
Spitz, Andreas [1 ]
Aumiller, Dennis [2 ]
Soproni, Balint [2 ]
Gertz, Michael [2 ]
机构
[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
[2] Heidelberg Univ, Heidelberg, Germany
关键词
COOCCURRENCE DATA; CENTRALITY;
D O I
10.1145/3400903.3400919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Efficiently and effectively representing large collections of text is of central importance to information retrieval tasks such as summarization and search. Since models for these tasks frequently rely on an implicit graph structure of the documents or their contents, graph-based document representations are naturally appealing. For tasks that consider the joint occurrence of words or entities, however, existing document representations often fall short in capturing cooccurrences of higher order, higher multiplicity, or at varying proximity levels. Furthermore, while numerous applications benefit from structured knowledge sources, external data sources are rarely considered as integral parts of existing document models. To address these shortcomings, we introduce heterogeneous hypergraphs as a versatile model for representing annotated document collections. We integrate external metadata, document content, entity and term annotations, and document segmentation at different granularity levels in a joint model that bridges the gap between structured and unstructured data. We discuss selection and transformation operations on the set of hyperedges, which can be chained to support a wide range of query scenarios. To ensure compatibility with established information retrieval methods, we discuss projection operations that transform hyperedges to traditional dyadic cooccurrence graph representations. Using PostgreSQL and Neo4j, we investigate the suitability of existing database systems for implementing the hypergraph document model, and explore the impact of utilizing implicit and materialized hyperedge representations on storage space requirements and query performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Description of a semantic-based navigation model to explore document collections in the maritime domain
    Dragos, Valentina
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 : 1155 - 1164
  • [42] XTRACT: Learning Document Type Descriptors from XML Document Collections
    Minos Garofalakis
    Aristides Gionis
    Rajeev Rastogi
    S. Seshadri
    Kyuseok Shim
    Data Mining and Knowledge Discovery, 2003, 7 : 23 - 56
  • [43] XTRACT: Learning Document Type Descriptors from XML document collections
    Garofalakis, M
    Gionis, A
    Rastogi, R
    Seshadri, S
    Shim, K
    DATA MINING AND KNOWLEDGE DISCOVERY, 2003, 7 (01) : 23 - 56
  • [44] Hypergraph based document categorization: frequent itemsets vs hypercliques
    Hu, Tian-Ming
    Ouyang, Ji
    Qu, Chao
    Sung, Sam Yuan
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 824 - +
  • [45] Content-based document image retrieval in complex document collections
    Agam, G.
    Argamon, S.
    Friedera, O.
    Grossman, D.
    Lewis, D.
    DOCUMENT RECOGNITION AND RETRIEVAL XIV, 2007, 6500
  • [46] Towards versatile document analysis systems
    Baird, HS
    Casey, MR
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 280 - 290
  • [47] Generating hypergraph of term associations for automatic document concept clustering
    Chiang, IJ
    Lin, TY
    Hsu, JYJ
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, 2004, : 181 - 186
  • [48] A VERSATILE CHEMICAL DOCUMENT WORD PROCESSOR
    WEISFELD, LB
    AMERICAN LABORATORY, 1988, 20 (09) : 58 - 59
  • [49] Versatile document image content extraction
    Baird, Henry S.
    Moll, Michael A.
    Nonnemaker, Jean
    Casey, Matthew R.
    Delorenzo, Don L.
    DOCUMENT RECOGNITION AND RETRIEVAL XIII, 2006, 6067
  • [50] Modeling document causal structure with a hypergraph for event causality identification
    Xiang, Wei
    Liu, Cheng
    Wang, Bang
    NEURAL NETWORKS, 2025, 184