Big Metadata: When Metadata is Big Data

被引:6
|
作者
Edara, Pavan [1 ]
Pasumansky, Mosha [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2021年 / 14卷 / 12期
关键词
DREMEL;
D O I
10.14778/3476311.3476385
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid emergence of cloud data warehouses like Google Big-Query has redefined the landscape of data analytics. With the growth of data volumes, such systems need to scale to hundreds of EiB of data in the near future. This growth is accompanied by an increase in the number of objects stored and the amount of metadata such systems must manage. Traditionally, Big Data systems have tried to reduce the amount of metadata in order to scale the system, often compromising query performance. In Google BigQuery, we built a metadata management system that demonstrates that massive scale can be achieved without such tradeoffs. We recognized the benefits that fine grained metadata provides for query processing and we built a metadata system to manage it effectively. We use the same distributed query processing and data management techniques that we use for managing data to handle Big metadata. Today, BigQuery uses these techniques to support queries over billions of objects and their metadata.
引用
收藏
页码:3083 / 3095
页数:13
相关论文
共 50 条
  • [11] ELECTRONIC HEALTH RECORDS DATA AND METADATA: Challenges for Big Data in the United States
    Sweet, Lauren E.
    Moulaison, Heather Lea
    BIG DATA, 2013, 1 (04) : BD245 - BD251
  • [12] A proteomics sample metadata representation for multiomics integration and big data analysis
    Dai, Chengxin
    Fullgrabe, Anja
    Pfeuffer, Julianus
    Solovyeva, Elizaveta M.
    Deng, Jingwen
    Moreno, Pablo
    Kamatchinathan, Selvakumar
    Kundu, Deepti Jaiswal
    George, Nancy
    Fexova, Silvie
    Gruening, Bjoern
    Foell, Melanie Christine
    Griss, Johannes
    Vaudel, Marc
    Audain, Enrique
    Locard-Paulet, Marie
    Turewicz, Michael
    Eisenacher, Martin
    Uszkoreit, Julian
    Van den Bossche, Tim
    Schwammle, Veit
    Webel, Henry
    Schulze, Stefan
    Bouyssie, David
    Jayaram, Savita
    Duggineni, Vinay Kumar
    Samaras, Patroklos
    Wilhelm, Mathias
    Choi, Meena
    Wang, Mingxun
    Kohlbacher, Oliver
    Brazma, Alvis
    Papatheodorou, Irene
    Bandeira, Nuno
    Deutsch, Eric W.
    Vizcaino, Juan Antonio
    Bai, Mingze
    Sachsenberg, Timo
    Levitsky, Lev I.
    Perez-Riverol, Yasset
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [13] A proteomics sample metadata representation for multiomics integration and big data analysis
    Chengxin Dai
    Anja Füllgrabe
    Julianus Pfeuffer
    Elizaveta M. Solovyeva
    Jingwen Deng
    Pablo Moreno
    Selvakumar Kamatchinathan
    Deepti Jaiswal Kundu
    Nancy George
    Silvie Fexova
    Björn Grüning
    Melanie Christine Föll
    Johannes Griss
    Marc Vaudel
    Enrique Audain
    Marie Locard-Paulet
    Michael Turewicz
    Martin Eisenacher
    Julian Uszkoreit
    Tim Van Den Bossche
    Veit Schwämmle
    Henry Webel
    Stefan Schulze
    David Bouyssié
    Savita Jayaram
    Vinay Kumar Duggineni
    Patroklos Samaras
    Mathias Wilhelm
    Meena Choi
    Mingxun Wang
    Oliver Kohlbacher
    Alvis Brazma
    Irene Papatheodorou
    Nuno Bandeira
    Eric W. Deutsch
    Juan Antonio Vizcaíno
    Mingze Bai
    Timo Sachsenberg
    Lev I. Levitsky
    Yasset Perez-Riverol
    Nature Communications, 12
  • [14] Extended query model for MOOC education resource metadata based on big data
    Cao, Yu
    Chen, Shu-Wen
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (04) : 374 - 387
  • [15] Device-driven Metadata Management Solutions for Scientific Big Data Use Cases
    Grunzke, Richard
    Mueller-Pfefferkorn, Ralph
    Jaekel, Rene
    Starek, Juergen
    Hardt, Marcus
    Hartmann, Volker
    Potthoff, Jan
    Hesser, Juergen
    Kepper, Nick
    Gesing, Sandra
    Kindermann, Stephan
    2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 317 - 321
  • [16] RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor
    Pallotta, Simone
    Cascianelli, Silvia
    Masseroli, Marco
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [17] RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor
    Simone Pallotta
    Silvia Cascianelli
    Marco Masseroli
    BMC Bioinformatics, 23
  • [18] Studies of Big Data metadata segmentation between relational and non-relational databases
    Golosova, M. V.
    Grigorieva, M. A.
    Klimentov, A. A.
    Ryabinkin, E. A.
    Dimitrov, G.
    Potekhin, M.
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [19] A Stakeholder Lens on Metadata Management in Business Intelligence and Big Data - Results of an Empirical Investigation
    Dinter, Barbara
    Schieder, Christian
    Gluchowski, Peter
    AMCIS 2015 PROCEEDINGS, 2015,
  • [20] Metadata Traces and Workload Models for Evaluating Big Storage Systems
    Abad, Cristina L.
    Luu, Huong
    Roberts, Nathan
    Lee, Kihwal
    Lu, Yi
    Campbell, Roy H.
    2012 IEEE/ACM FIFTH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC 2012), 2012, : 125 - 132