SWeG: Lossless and Lossy Summarization of Web-Scale Graphs

被引:26
|
作者
Shin, Kijung [1 ]
Ghoting, Amol [2 ]
Kim, Myunghwan [2 ]
Raghavan, Hema [2 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon, South Korea
[2] LinkedIn Corp, Mountain View, CA USA
关键词
D O I
10.1145/3308558.3313402
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Given a terabyte-scale graph distributed across multiple machines, how can we summarize it, with much fewer nodes and edges, so that we can restore the original graph exactly or within error bounds? As large-scale graphs are ubiquitous, ranging from web graphs to online social networks, compactly representing graphs becomes important to efficiently store and process them. Given a graph, graph summarization aims to find its compact representation consisting of (a) a summary graph where the nodes are disjoint sets of nodes in the input graph, and each edge indicates the edges between all pairs of nodes in the two sets; and (b) edge corrections for restoring the input graph from the summary graph exactly or within error bounds. Although graph summarization is a widely-used graph-compression technique readily combinable with other techniques, existing algorithms for graph summarization are not satisfactory in terms of speed or compactness of outputs. More importantly, they assume that the input graph is small enough to fit in main memory. In this work, we propose SWeG, a fast parallel algorithm for summarizing graphs with compact representations. SWeG is designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory. We demonstrate that SWeG is (a) Fast: SWeG is up to 5400x faster than its competitors that give similarly compact representations, (b) Scalable: SWeG scales to graphs with tens of billions of edges, and (c) Compact: combined with state-of-the-art compression methods, SWeG achieves up to 3.4x better compression than them.
引用
收藏
页码:1679 / 1690
页数:12
相关论文
共 50 条
  • [21] Realtime Index-Free Single Source SimRank Processing on Web-Scale Graphs
    Shi, Jieming
    Jin, Tianyuan
    Yang, Renchi
    Xiao, Xiaokui
    Yang, Yin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (07): : 966 - 978
  • [22] Web-Scale Extraction of Structured Data
    Cafarella, Michael J.
    Madhavan, Jayant
    Halevy, Alon
    SIGMOD RECORD, 2008, 37 (04) : 55 - 61
  • [23] Web-scale image clustering revisited
    Avrithis, Yannis
    Kalantidis, Yannis
    Anagnostopoulos, Evangelos
    Emiris, Ioannis Z.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1502 - 1510
  • [24] Web-Scale Multimedia Information Networks
    Qi, Guo-Jun
    Tsai, Min-Hsuan
    Tsai, Shen-Fu
    Cao, Liangliang
    Huang, Thomas S.
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2688 - 2704
  • [25] Web-Scale Human Task Management
    Schulte, Daniel
    SOFTWARE ARCHITECTURE, 2011, 6903 : 190 - 193
  • [26] Web-Scale Training for Face Identification
    Taigman, Yaniv
    Yang, Ming
    Ranzato, Marc'Aurelio
    Wolf, Lior
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2746 - 2754
  • [27] Social Web-Scale Provenance in the Cloud
    Simmhan, Yogesh
    Gomadam, Karthik
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2010, 6378 : 298 - 300
  • [28] Web-Scale Multimedia Processing and Applications
    Chang, Edward
    Chang, Shih-Fu
    Hauptmann, Alexander G.
    Huang, Thomas S.
    Slaney, Malcolm
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2580 - 2583
  • [29] Face recognition for web-scale datasets
    Ortiz, Enrique G.
    Becker, Brian C.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 118 : 153 - 170
  • [30] Web-scale semantic information processing
    Heflin, Jeff
    Stuckenschmidt, Heiner
    JOURNAL OF WEB SEMANTICS, 2012, 10 : 1 - 2