Agile Data Engineering – Hyper Adoption
I’m slowly reading my way through Ralph Hughes latest Agile Data Warehousing book “Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders”
Hyper Normalisation vs Hyper Generalisation
One of the areas Ralph covers extensively is the area of Agile Data Engineering using Hyper Normalisation and Hyper Generalisation modelling techniques.
Ralph defines Hyper Normalisation as:
“I use the term hyper normalization to refer to a family of data modeling techniques that all employ ensemble data modeling.”
So using ensemble models such as Data Vault (which you will have picked up on by now we love) and Anchor modelling.
Hyper Generalisation is where you take it one step further and store the data in a small number of tables and maintain the relationships and context via metadata rather than data.
Ralph describes it as:
“hyper generalized warehouse decomposes and stores an input record into multiple logical targets. The elements within the single source information record are decomposed into four categories:
- Things: All business entities (not just major “business keys” as in the hyper normalized approach)
- Links: Relationships between things
- Qualifying attributes: Analogous to the attributes in HNF
- Quantifying attributes: The measures occurring within transactions and events”
The storage of Hyper Generalisation uses a blend of technologies including:
- An associative data store provides the core entities and rollup hierarchies for the dimensions
- Name-value pair table enriches those dimensions with attributes
- Relational transaction tables store the fact records
Well it is.
The trick with the Hyper Generalised model is that all the metadata for your data warehouse is stored in your tool and that without it, it’s near on impossible to get the data out in a way that it can be used.
Compare this to Hyper Normalised (Data Vault) where you can semi-easily query the data and get the correct results back. Once you understand the relationship between Hubs, Sats and Links.
The other complexity with the Hyper Generalised approach is the meta model used to store and retrieve the data is also complex resulting in a lot of risk of things going wrong if you try and code your own. Which is why Ralph recommends using commercial tools if you go down this path. Kalido is one such tool.
And that got me thinking.
DW Automation Adoption
We are seeing a marked uptake in the use of Data Vault to enable automation of the integration layers within the data warehouse. Based on our experience with it to date, I can’t see why anybody would spend months creating bespoke ETL code in an expensive and legacy flow based ETL tool. You can deploy the same capability in weeks, if not days, using the Data Vault approach. This can be done either hand coded or using our open source product such as our ODE.
But if Hyper Generalisation is even faster why isn’t that being adopted at a greater rate?
One reason is that there are not a lot of commercial products around fuelling the adoption. But the same can be said of products supporting the Hyper Normalisation models as well.
However with Hyper Normalisation approaches such as Data Vault and Anchor modelling, you can easily build your own hand coded solution once you understand the design pattern.
And for me that is why it seems to be being adopted at a faster rate. After all, the people who are adopting it are typically coming from a Data Warehousing background and are used to writing code, rather than a ERP or CRM background where they are used to configuring applications.
Here in New Zealand, it would be fair to say we have a real habit of building our own, rather than buying a pre-canned solution, so I think Hyper Normalisation will prevail here.
2016 the year of DW Automation?
I think so.
In my view the Big Data hype has finally started to subside. We can go back to focussing on making Data Warehousing, Analytics and Business Intelligence delivery faster and less risky. DW professionals are looking to ways to deliver better and modelling approaches like Data Vault enable them to do this.
I think we will see some of the analyst organisations that drive the hype cycles talking about DW Automation and that will drive its adoption as well.
And of course once the analyst organisations start talking about it the ETL vendors will jump on the band wagon and start changing their marketing material to position their offerings as delivering DW Automation. Hell who knows they may even add some features to their products that make it so.
So I say bring it on, let’s get the DW automation party started, and move one step closer to Hyper adoption.
Start your AgileBI journey with us.
Its that time of year again, New Year has been and gone, Christmas is but a memory, in half the world Winter is mid flow and in the other half (including my half) are mid summer wearing shorts and jandles.As well as good friends, good barbecues and great...
It's that time of the year (where the hell did 12 months go!) where Gartner announce the latest version of their Magic Quadrant for Business Intelligence tools, or this year what they call "Modern Analytics and BI Platforms".Im always interested in which...
As a Stakeholder or Product Owner I want I want to understand if constantly adding Agile Team members is a good idea So that I can have a valid conversation with my Product Owner and Scrum Master Initially at least! VelocityWhen building a new AgileBI delivery team,...
February has come around again, the Gartner BI conference is running in Sydney Australia, people I know are in Sydney drinking beer, I am not, and the Gartner Magic Quadrant for Analytics and Business Intelligence Platform is out.Can I take a little space to rant (of...
As a Stakeholder or Product Owner
I want to understand who should attend the steering committee
So that I know I have the best representation
As a Stakeholder or Product Owner
I want to understand what experience a Project Manager needs
So that I know if they are suitable as a scrum master
3. AgileBI Things: Agile Data Science, Governed Data Lake, The Agile Data Warehouse A Practical Approach
Source: Pixabay Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-04-09". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Agile Data Science When the new world of...
Source: Mosman Library Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-04-02". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Big Data Papers I still hear...
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-03-26". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Automated Testing Goals So I have finally been lucky...
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-03-19". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Big Model Upfront When delivering an AgileBI project one...
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIn Pulse. This article below is a copy of "AgileBI Things - 2017-03-12". Shane also writes on AgileBI concepts at AgileBI Guru. Image source: Pixabay 1. SQL Server Temporal tables, automating...
3. AgileBI Things – Data Lakes vs Data Warehouses, Change Management in an Agile approach, So you wanna be a Scrum Master
Shane writes an AgileBI series called "3 AgileBI Things" published on LinkedIN Pulse. This article below is a copy of "3. AgileBI Things - 2017-02-19". Shane also writes on AgileBI concepts at AgileBI Guru. 1. Data Lakes vs Data Warehouses I think in the next few...