A Versatile Hypergraph Model for Document Collections

被引：1

作者：

Spitz, Andreas ^{[1
]}

Aumiller, Dennis ^{[2
]}

Soproni, Balint ^{[2
]}

Gertz, Michael ^{[2
]}

机构：

[1] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

[2] Heidelberg Univ, Heidelberg, Germany

来源：

PROCEEDINGS OF THE 32TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2020 | 2020年

关键词：

COOCCURRENCE DATA; CENTRALITY;

D O I：

10.1145/3400903.3400919

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Efficiently and effectively representing large collections of text is of central importance to information retrieval tasks such as summarization and search. Since models for these tasks frequently rely on an implicit graph structure of the documents or their contents, graph-based document representations are naturally appealing. For tasks that consider the joint occurrence of words or entities, however, existing document representations often fall short in capturing cooccurrences of higher order, higher multiplicity, or at varying proximity levels. Furthermore, while numerous applications benefit from structured knowledge sources, external data sources are rarely considered as integral parts of existing document models. To address these shortcomings, we introduce heterogeneous hypergraphs as a versatile model for representing annotated document collections. We integrate external metadata, document content, entity and term annotations, and document segmentation at different granularity levels in a joint model that bridges the gap between structured and unstructured data. We discuss selection and transformation operations on the set of hyperedges, which can be chained to support a wide range of query scenarios. To ensure compatibility with established information retrieval methods, we discuss projection operations that transform hyperedges to traditional dyadic cooccurrence graph representations. Using PostgreSQL and Neo4j, we investigate the suitability of existing database systems for implementing the hypergraph document model, and explore the impact of utilizing implicit and materialized hyperedge representations on storage space requirements and query performance.

引用

页数：12

共 50 条

[1] Using webspaces to model document collections on the web
Van Zwol, R
Apers, PMG
CONCEPTUAL MODELING FOR E-BUSINESS AND THE WEB, PROCEEDINGS, 2000, 1921 : 101 - 114
[2] Using webspaces to model document collections on the web
Van Zwol, Roelof
Apers, Peter M.G.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2000, 1921 : 101 - 114
[3] Extensible access control model for XML document collections
Sladic, Goran
Milosavljevic, Branko
Konjovic, Zora
SECRYPT 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, 2007, : 373 - 380
[4] STRUCTURES INDUCED BY COLLECTIONS OF SUBSETS - A HYPERGRAPH APPROACH
SEIDMAN, SB
MATHEMATICAL SOCIAL SCIENCES, 1981, 1 (04) : 381 - 396
[5] Heterogeneous hypergraph embedding for document recommendation
Zhu, Yu
Guan, Ziyu
Tan, Shulong
Liu, Haifeng
Cai, Deng
He, Xiaofei
NEUROCOMPUTING, 2016, 216 : 150 - 162
[6] A Neural Model for Joint Document and Snippet Ranking in Question Answering for Large Document Collections
Pappas, Dimitris
Androutsopoulos, Ion
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3896 - 3907
[7] Inspecting document collections
Bohnacker, U
Franke, J
Mogg-Schneider, H
Renz, I
READING AND LEARNING, 2004, 2956 : 235 - 251
[8] A Scalable Model for Tracking Topical Evolution in Large Document Collections
Naim, Sheikh Motahar
Boedihardjo, Arnold P.
Hossain, M. Shahriar
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 726 - 735
[9] Efficient Methods for Topic Model Inference on Streaming Document Collections
Yao, Limin
Mimno, David
McCallum, Andrew
KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 937 - 945
[10] Automatic document clustering of concept hypergraph decompositions
Lin, TY
Chiang, IJ
DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY VI, 2004, 5433 : 168 - 177

← 1 2 3 4 5 →