Safely Managing Data Variety in Big Data Software Development

被引:0
|
作者
Cerqueus, Thomas [1 ]
de Almeida, Eduardo Cunha [2 ]
Scherzinger, Stefanie [3 ]
机构
[1] Univ Lyon, CNRS, INSA Lyon, LIRIS,UMR5205, Lyon, France
[2] Univ Fed Parana, BR-80060000 Curitiba, Parana, Brazil
[3] OTH Regensburg, Regensburg, Germany
关键词
SCHEMA EVOLUTION; MODEL;
D O I
10.1109/BIGDSE.2015.9
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider the task of building Big Data software systems, offered as software-as-a-service. These applications are commonly backed by NoSQL data stores that address the proverbial Vs of Big Data processing: NoSQL data stores can handle large volumes of data and many systems do not enforce a global schema, to account for structural variety in data. Thus, software engineers can design the data model on the go, a flexibility that is particularly crucial in agile software development. However, NoSQL data stores commonly do not yet account for the veracity of changes when it comes to changes in the structure of persisted data. Yet this is an inevitable consequence of agile software development. In most NoSQL-based application stacks, schema evolution is completely handled within the application code, usually involving object mapper libraries. Yet simple code refactorings, such as renaming a class attribute at the source code level, can cause data loss or runtime errors once the application has been deployed to production. We address this pain point by contributing type checking rules that we have implemented within an IDE plugin. Our plugin ControVol statically type checks the object mapper class declarations against the code release history. ControVol is thus capable of detecting common yet risky cases of mismatched data and schema, and can even suggest automatic fixes.
引用
收藏
页码:4 / 10
页数:7
相关论文
共 50 条
  • [11] Software readiness for data analytics and Big Data
    Cox, Travis
    Control Engineering, 2020, 67 (03) : 20 - 21
  • [12] Managing Big Data through Hybrid Data Infrastructures
    Candela, Leonardo
    Castelli, Donatella
    Pagano, Pasquale
    ERCIM NEWS, 2012, (89): : 37 - 38
  • [13] Big Data Mining: Managing the Costs of Data Mining
    Ganasan, Jaya R.
    2019 17TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2019, : 62 - 65
  • [14] Big Software Data Analysis
    Lungu, Mircea
    Nierstrasz, Oscar
    Schwarz, Niko
    ERCIM NEWS, 2012, (89): : 27 - 28
  • [15] A Collection of Software Engineering Challenges for Big Data System Development
    Hummel, Oliver
    Eichelberger, Holger
    Giloj, Andreas
    Werle, Dominik
    Schmid, Klaus
    44TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2018), 2018, : 362 - 369
  • [16] Managing big data experiments on smartphones
    Larkou, Georgios
    Mintzis, Marios
    Andreou, Panayiotis G.
    Konstantinidis, Andreas
    Zeinalipour-Yazti, Demetrios
    DISTRIBUTED AND PARALLEL DATABASES, 2016, 34 (01) : 33 - 64
  • [17] Managing big data experiments on smartphones
    Georgios Larkou
    Marios Mintzis
    Panayiotis G. Andreou
    Andreas Konstantinidis
    Demetrios Zeinalipour-Yazti
    Distributed and Parallel Databases, 2016, 34 : 33 - 64
  • [18] Legal aspects of managing Big Data
    Kemp, Richard
    COMPUTER LAW & SECURITY REVIEW, 2014, 30 (05) : 482 - 491
  • [19] ASK THE EXPERT MANAGING BIG DATA
    Everett, Lauren
    Lab Manager, 2021, 16 (11): : 42 - 43
  • [20] Managing Big Data in Manufacturing and Beyond
    Sadri, Kiana
    MANUFACTURING ENGINEERING, 2016, 157 (01): : 14 - 14