A cluster-based data deduplication technology

被引:1
|
作者
Tseng, Chuan-Mu [1 ]
Ciou, Jheng-Rong [2 ]
Liu, Tzong-Jye [2 ]
机构
[1] Jeh Teh Jr Coll Med Nursing & Management, Dept Appl Digital Media, Miaoli, Taiwan
[2] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung, Taiwan
关键词
Bloom filter; cluster; data deduplication;
D O I
10.1109/CANDAR.2014.22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication technology usually identifies redundant data quickly and correctly by using bloom filter technology. A bloom filter can determine whether there is redundant data. However, there are the presences of false positives. In order to avoid false positives, we need to compare a new chunk with chunks that have been stored. In order to reduce the time to exclude the bloom filter false positives, current research uses many small size index tables to store chunk ID. However, the target chunk ID only stores in one index table. Searching for the target chunk ID at another index table uselessly took a great deal of time. In this paper, we cluster the stored chunks to reduce the time of excluding the false positive problem induced by bloom filter.
引用
收藏
页码:226 / 230
页数:5
相关论文
共 50 条
  • [31] CNN: A Cluster-Based Named Data Routing for Vehicular Networks
    Ardakani, Saeid Pourroostaei
    Kwong, Chiew Foong
    Kar, Pushpendu
    Liu, Qianyu
    Li, Lincan
    IEEE ACCESS, 2021, 9 : 159036 - 159047
  • [32] Cluster-Based Join for Geographically Distributed Big RDF Data
    Yang, Fan
    Crainiceanu, Adina
    Chen, Zhiyuan
    Needham, Don
    2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 170 - 178
  • [33] Cluster-based evaluation in fuzzy-genetic data mining
    Chen, Chun-Hao
    Tseng, Vincent S.
    Hong, Tzung-Pei
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2008, 16 (01) : 249 - 262
  • [34] Cluster-based multidimensional scaling embedding tool for data visualization
    Hernandez-Leon, Patricia
    Caro, Miguel A.
    PHYSICA SCRIPTA, 2024, 99 (06)
  • [35] Cluster-based stability evaluation in time series data sets
    Gerhard Klassen
    Martha Tatusch
    Stefan Conrad
    Applied Intelligence, 2023, 53 : 16606 - 16629
  • [36] Dependable data aggregation on cluster-based wireless sensor networks
    Chang, Yue-Shan
    Huang, Jiun-Hua
    Juang, Tong-Ying
    PROCEEDINGS OF THE 11TH WSEAS INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOL 3: ADVANCES IN COMMUNICATIONS, 2007, : 300 - +
  • [37] Visual Analytics Toolkit for Cluster-Based Classification of Mobility Data
    Andrienko, Gennady
    Andrienko, Natalia
    Rinzivillo, Salvatore
    Nanni, Mirco
    Pedreschi, Dino
    ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2009, 5644 : 432 - +
  • [38] Data Gathering Cluster-Based Approach for In-Network Aggregation
    Vinodha, R.
    Durairaj, S.
    FIRST INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING, TECHNOLOGY AND SCIENCE - ICETETS 2016, 2016,
  • [39] Cluster-based zero-shot learning for multivariate data
    Hayashi, Toshitaka
    Fujita, Hamido
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (02) : 1897 - 1911
  • [40] Data Randomization and Cluster-Based Partitioning for Botnet Intrusion Detection
    Al-Jarrah, Omar Y.
    Alhussein, Omar
    Yoo, Paul D.
    Muhaidat, Sami
    Taha, Kamal
    Kim, Kwangjo
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (08) : 1796 - 1806