1. Agile Data Science
When the new world of data science and big data arrived, it arrived to solve a problem of data velocity, volume and variety, which the technologies at the time could not easily handle.
As a result of this new technology, there was also a move to do things quicker, by operating in teams of one, by hand generating code etc. These things often made a person faster (the first time) as teams of 1 often do.
Of course, the downside was often a decrease in sustainability and manageability of this code. None of these were issues related to the technologies used, it was related to the removal of the processes that were typically put in place on data warehouse projects (and which in turn were one of the factors that made them slower).
So I find it interesting when I see some of these processes and structures start to be applied to the Data Science and Big Data realm. This article in a “Data Science Architect” is a perfect example.
2. Governed Data Lake
And here is another example. We have always “staged” data in a data warehouse, we just never let anybody near it. Lately, there has been an interesting move to use a Data Vault as a governed data lake.
For me I think the concept of a loosely structured persistent landing layer, that is “structured” using a tagging paradigm, that is then consumed into a Data Vault when the data needs to be enriched and presented to users is the way to go.
But here is Talends view, which is an interesting read none the less:
3. The Agile Data Warehouse A Practical Approach
The reason I like landing data first and then consuming it in the Data Vault as required is that it means we can get closer to the nirvana of requirements to production in 3 weeks, that an AgileBI project should strive for.
And by this I mean not just uploading an Excel spreadsheet into an “Agile BI Tool” with no reusable layers in between, that’s called a prototype (at best)
So the last one for this week is a great article on how to approach an Agile Data Warehouse.
When the team have time I am nagging them to add new cool features to ODE, our open source data vault automation engine at ode.ninja.
Most of my time is actually spent teaching and coaching customers how to deliver using AgileBI as part of the Optimal Orange team.