Frost: A Platform for Benchmarking and Exploring Data Matching Results

被引:1
|
作者
Graf, Martin [1 ]
Laskowski, Lukas [1 ]
Papsdorf, Florian [1 ]
Sold, Florian [1 ]
Gremmelspacher, Roland [2 ]
Naumann, Felix [1 ]
Panse, Fabian [3 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
[2] SAP SE, Walldorf, Germany
[3] Univ Hamburg, Hamburg, Germany
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 15卷 / 12期
关键词
ENTITY; MAGELLAN;
D O I
10.14778/3554821.3554823
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
"Bad" data has a direct impact on 88% of companies, with the average company losing 12% of its revenue due to it. Duplicates - multiple but different representations of the same real-world entities are among the main reasons for poor data quality, so finding and configuring the right deduplication solution is essential. Existing data matching benchmarks focus on the quality of matching results and neglect other important factors, such as business requirements. Additionally, they often do not support the exploration of data matching results. To address this gap between the mere counting of record pairs vs. a comprehensive means to evaluate data matching solutions, we present the Frost platform. It combines existing benchmarks, established quality metrics, cost and effort metrics, and exploration techniques, making it the first platform to allow systematic exploration to understand matching results. Frost is implemented and published in the open-source application Snowman, which includes the visual exploration of matching results, as shown in Figure 1.
引用
收藏
页码:3292 / 3305
页数:14
相关论文
共 50 条
  • [1] Benchmarking Stereo Data (Not the Matching Algorithms)
    Haeusler, Ralf
    Klette, Reinhard
    PATTERN RECOGNITION, 2010, 6376 : 383 - 392
  • [2] Exploring Use Cases for an Hourly Building Energy Benchmarking Platform
    Andrews, Abigail
    Jain, Rishee K.
    PROCEEDINGS OF THE 2022 THE 9TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2022, 2022, : 303 - 304
  • [3] Babel: A Generic Benchmarking Platform for Big Data Architectures
    Sfaxi, Lilia
    Ben Aissa, Mohamed Mehdi
    BIG DATA RESEARCH, 2021, 24
  • [4] Benchmarking Onboard Science Data Retrieval Algorithms on the Snapdragon Platform
    Lightholder, Jack
    Donitz, Benjamin
    Castillo-Rogez, Julie
    Sheldon, Douglas
    2023 IEEE AEROSPACE CONFERENCE, 2023,
  • [5] NDEC: A NEA platform for nuclear data testing, verification and benchmarking
    Diez, C. J.
    Michel-Sendis, F.
    Cabellos, O.
    Bossant, M.
    Soppera, N.
    ND 2016: INTERNATIONAL CONFERENCE ON NUCLEAR DATA FOR SCIENCE AND TECHNOLOGY, 2017, 146
  • [6] Building, benchmarking, and exploring perturbative maps of transcriptional and morphological data
    Celik, Safiye
    Hutter, Jan-Christian
    Carlos, Sandra Melo
    Lazar, Nathan H.
    Mohan, Rahul
    Tillinghast, Conor
    Biancalani, Tommaso
    Fay, Marta M.
    Earnshaw, Berton A.
    Haque, Imran S.
    PLOS COMPUTATIONAL BIOLOGY, 2024, 20 (10)
  • [7] EMBench++: Data for a thorough benchmarking of matching-related methods
    Ioannou, Ekaterini
    Velegrakis, Yannis
    SEMANTIC WEB, 2019, 10 (02) : 435 - 450
  • [8] Benchmarking a Virtualization Platform
    Soundararajan, Vijayaraghavan
    Agrawal, Banit
    Herndon, Bruce
    Sethuraman, Priya
    Taheri, Reza
    2014 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC), 2014, : 99 - 109
  • [9] TEEDA: An Interactive Platform for Matching Data Providers and Users in the Data Marketplace
    Hayashi, Teruaki
    Ohsawa, Yukio
    INFORMATION, 2020, 11 (04)
  • [10] Approach Social Content Matching for Big Data Platform
    Jeong, Seok-jun
    Shin, Dong-ryeol
    2015 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ARTIFICIAL INTELLIGENCE (CAAI 2015), 2015, : 281 - 284