Dynamic Data Protection System for Open Big Data Environment

被引:0
|
作者
Tu Y.-F. [1 ,2 ]
Niu J.-H. [2 ]
Wang D.-Z. [1 ,2 ]
Gao H. [2 ]
Xu J. [2 ]
Hong K. [2 ]
Yang F. [2 ]
机构
[1] State Key Laboratory of Mobile Network and Mobile Multimedia Technology, ZTE Corporation, Shenzhen
[2] ZTE Corporation, Nanjing
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 03期
关键词
big data; data masking; dynamic data masking; query dependency; SQL rewriting;
D O I
10.13328/j.cnki.jos.006783
中图分类号
学科分类号
摘要
Big data has become a national basic strategic resource, and the opening and sharing of data is the core of China’s big data strategy. Cloud native technology and lake-house architecture are reconstructing the big data infrastructure and promoting data sharing and value dissemination. The development of big data industry and technology require stronger data security and data sharing capabilities. However, data security in an open environment has become a bottleneck, which restricts the development and utilization of big data technology. The issues of data security and privacy protection have become increasingly prominent both in the open source big data ecosystem and the commercial big data system. Dynamic data protection system under the open big data environment is now facing challenges of data availability, processing efficiency and system scalability and etc. This study proposes a dynamic data protection system BDMasker for the open big data environment. Through a precise query analysis and query rewriting technology based on the query dependency model, it can accurately perceive but not change the original business request, which indicates that the whole process of dynamic desensitization has zero impact on the business. Furthermore, its multi-engine-oriented unified security strategy framework realizes the vertical expansion of dynamic data protection capabilities and the horizontal expansion among multiple computing engines. The distributed computing capability of the big data execution engine can be used to improve the data protection processing performance of the system. The experimental results show that the precise SQL analysis and rewriting technology proposed by BDMasker is effective, the system has good scalability and performance, and the overall performance fluctuates within 3% in the TPC-DS and YCSB benchmark tests. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1213 / 1235
页数:22
相关论文
共 23 条
  • [11] Manjunath TN, Hegadi RS, Mohan HS., Automated data validation for data migration security, Int’l Journal of Computer Applications, 30, 6, pp. 41-46, (2011)
  • [12] Magic quadrant for data masking technology, (2022)
  • [13] Best data masking tools and software, (2022)
  • [14] Moffie M, Mor D, Asaf S, Farkash A., Next generation data masking engine, Proc. of the Int’l Workshop on Data Privacy Management, Cryptocurrencies and Blockchain Technology, pp. 152-160, (2021)
  • [15] Xu MT., Dynamic data masking of openGauss, (2022)
  • [16] Apache Hive, (2022)
  • [17] Baranchikov AI, Gromov AY, Gurov VS, Grinchenko NN, Babaev SI., The technique of dynamic data masking in information systems, Proc. of the 5th Mediterranean Conf. on Embedded Computing (MECO), pp. 473-476, (2016)
  • [18] Archana RA, Hegadi RS, Manjunath TN., A study on big data privacy protection models using data masking methods, Int’l Journal of Electrical and Computer Engineering (IJECE), 8, 5, pp. 3976-3983, (2018)
  • [19] Larsonk KS, Boukari S., An improved data masking security solution using modulus based technique (MOBAT) for data warehouse system, Int’l Journal of Science and Engineering Applications, 9, 6, pp. 68-78, (2020)
  • [20] Cui BJ, Zhang BH, Wang KY., A data masking scheme for sensitive big data based on format-preserving encryption, Proc. of the IEEE Int’l Conf. on Computational Science and Engineering (CSE) and IEEE Int’l Conf. on Embedded and Ubiquitous Computing (EUC), pp. 518-524, (2017)