Agile Data Engineering – Hyper Adoption

by Feb 5, 2016

I’m slowly reading my way through Ralph Hughes latest Agile Data Warehousing book “Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders”

Hyper Normalisation vs Hyper Generalisation

One of the areas Ralph covers extensively is the area of Agile Data Engineering using Hyper Normalisation and Hyper Generalisation modelling techniques.

Ralph defines Hyper Normalisation as:

“I use the term hyper normalization to refer to a family of data modeling techniques that all employ ensemble data modeling.”

So using ensemble models such as Data Vault (which you will have picked up on by now we love) and Anchor modelling.

Hyper Generalisation is where you take it one step further and store the data in a small number of tables and maintain the relationships and context via metadata rather than data.

Ralph describes it as:

hyper generalized warehouse decomposes and stores an input record into multiple logical targets. The elements within the single source information record are decomposed into four categories:

  • Things: All business entities (not just major “business keys” as in the hyper normalized approach)
  • Links: Relationships between things
  • Qualifying attributes: Analogous to the attributes in HNF
  • Quantifying attributes: The measures occurring within transactions and events”

The storage of Hyper Generalisation uses a blend of technologies including:

  • An associative data store provides the core entities and rollup hierarchies for the dimensions
  • Name-value pair table enriches those dimensions with attributes
  • Relational transaction tables store the fact records

Sounds Complex

Well it is.

The trick with the Hyper Generalised model is that all the metadata for your data warehouse is stored in your tool and that without it, it’s near on impossible to get the data out in a way that it can be used.

Compare this to Hyper Normalised (Data Vault) where you can semi-easily query the data and get the correct results back. Once you understand the relationship between Hubs, Sats and Links.

The other complexity with the Hyper Generalised approach is the meta model used to store and retrieve the data is also complex resulting in a lot of risk of things going wrong if you try and code your own. Which is why Ralph recommends using commercial tools if you go down this path. Kalido is one such tool.

And that got me thinking.

DW Automation Adoption

We are seeing a marked uptake in the use of Data Vault to enable automation of the integration layers within the data warehouse. Based on our experience with it to date, I can’t see why anybody would spend months creating bespoke ETL code in an expensive and legacy flow based ETL tool. You can deploy the same capability in weeks, if not days, using the Data Vault approach. This can be done either hand coded or using our open source product such as our ODE.

But if Hyper Generalisation is even faster why isn’t that being adopted at a greater rate?

One reason is that there are not a lot of commercial products around fuelling the adoption. But the same can be said of products supporting the Hyper Normalisation models as well.

However with Hyper Normalisation approaches such as Data Vault and Anchor modelling, you can easily build your own hand coded solution once you understand the design pattern.

And for me that is why it seems to be being adopted at a faster rate. After all, the people who are adopting it are typically coming from a Data Warehousing background and are used to writing code, rather than a ERP or CRM background where they are used to configuring applications.

Here in New Zealand, it would be fair to say we have a real habit of building our own, rather than buying a pre-canned solution, so I think Hyper Normalisation will prevail here.

2016 the year of DW Automation?

I think so.

In my view the Big Data hype has finally started to subside. We can go back to focussing on making Data Warehousing, Analytics and Business Intelligence delivery faster and less risky. DW professionals are looking to ways to deliver better and modelling approaches like Data Vault enable them to do this.

I think we will see some of the analyst organisations that drive the hype cycles talking about DW Automation and that will drive its adoption as well.

And of course once the analyst organisations start talking about it the ETL vendors will jump on the band wagon and start changing their marketing material to position their offerings as delivering DW Automation.  Hell who knows they may even add some features to their products that make it so.

So I say bring it on, let’s get the DW automation party started, and move one step closer to Hyper adoption.

Start your AgileBI journey with us.

Other steps in the AgileBI Journey

And sometimes a sprint or a scrum.

Adding members to the Agile Team makes you go slower

As a Stakeholder or Product Owner I want I want to understand if constantly adding Agile Team members is a good idea So that I can have a valid conversation with my Product Owner and Scrum Master Initially at least! VelocityWhen building a new AgileBI delivery team,...

read more