SWeG: Lossless and Lossy Summarization of Web-Scale Graphs

被引:26
|
作者
Shin, Kijung [1 ]
Ghoting, Amol [2 ]
Kim, Myunghwan [2 ]
Raghavan, Hema [2 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon, South Korea
[2] LinkedIn Corp, Mountain View, CA USA
关键词
D O I
10.1145/3308558.3313402
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400x faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4x better compression than them.
引用
收藏
页码:1679 / 1690
页数:12
相关论文
共 50 条
  • [41] Cluster Computing for Web-Scale Data Processing
    Kimball, Aaron
    Michels-Slettvet, Sierra
    Bisciglia, Christophe
    SIGCSE'08: PROCEEDINGS OF THE 39TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, 2008, : 116 - 120
  • [42] Managing metadata in web-scale discovery systems
    Alvey, Elizabeth
    AUSTRALIAN ACADEMIC & RESEARCH LIBRARIES, 2016, 47 (04) : 327 - 328
  • [43] Transdisciplinary ITexts and the Future of Web-Scale Collaboration
    Fernheimer, Janice W.
    Litterio, Lisa
    Hendler, James
    JOURNAL OF BUSINESS AND TECHNICAL COMMUNICATION, 2011, 25 (03) : 322 - 337
  • [44] WSKE: A Web-Scale Spatial Knowledge Extractor
    Lee, S.
    Kim, I.
    ADVANCED SCIENCE LETTERS, 2017, 23 (12) : 12757 - 12761
  • [45] Dremel: Interactive Analysis of Web-Scale Datasets
    Melnik, Sergey
    Gubarev, Andrey
    Long, Jing Jing
    Romer, Geoffrey
    Shivakumar, Shiva
    Tolton, Matt
    Vassilakis, Theo
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 330 - 339
  • [46] Managing Metadata in Web-scale Discovery Systems
    Hagen, Brianne
    LIBRARY RESOURCES & TECHNICAL SERVICES, 2017, 61 (03): : 172 - 173
  • [47] A Dataset for Web-Scale Knowledge Base Population
    Glass, Michael
    Gliozzo, Alfio
    SEMANTIC WEB (ESWC 2018), 2018, 10843 : 256 - 271
  • [48] Toward Web-scale workflows for film production
    Ouyang, Chun
    La Rosa, Marcello
    ter Hofstede, Arthur H. M.
    Dumas, Marlon
    Shortland, Katherine
    IEEE INTERNET COMPUTING, 2008, 12 (05) : 53 - 61
  • [49] Web-scale Entity Annotation Using MapReduce
    Gupta, Shashank
    Chandramouli, Varun
    Chakrabarti, Soumen
    2013 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2013, : 99 - 108
  • [50] Web-Scale Service Delivery for Smart Cities
    Li, Fei
    Voegler, Michael
    Sehic, Sanjn
    Qanbari, Soheil
    Nastic, Stefan
    Hong-Linh Truong
    Dustdar, Schahram
    IEEE INTERNET COMPUTING, 2013, 17 (04) : 78 - 83