Machine, Process and Human Generated Data

by Oct 17, 2016

BI Modal Courses and Conferences

When I can make the time I find it useful to attend conferences and courses. I try and mix up the things I attend to combine both events that are in the BI domain and events that are outside the BI domain as I always seem to find something interesting and relevant attending both types. My all time fav conference is still Webstock, it’s one you should definitely add to your conference bucket list.

Sometimes attending a course reinforces to me how much I love the courses we run over at OptimalBI due to the high level of hands-on content we include (nothing like sitting listening to somebody speak to powerpoint for a few days to reinforce this). Sometimes it gives me time to zone out and think through some of the things I have been working on and off for a while and an “ah ha” moment arrives.

But there is always a few key new things I take away from the course itself.

My Latest Takeaway

Last month I managed to make some time so I jetted off to Sydney to attend the “Building the Data-Driven Business” course which was delivered by Barry Devlin and hosted by Ian from BI Ready.

One of the things I took away from this course and have used consistently since, is the concept of Machine, Process, and Human Generated data.

The concept is explained in Barry’s book Business unIntelligence, and there is also a great summary available in a white paper Barry wrote for Teradata in 2013 Unlocking Machine-Generated Data. An excerpt from this paper:

1. Human-sourced information* : All information ultimately originates from people, an artifact of the human mind. This information is the highly subjective record of human experiences, previously recorded in books and works of art, and later in photographs, audio and video. Human-sourced information is now almost entirely digitized and electronically stored everywhere from tweets to movies. Loosely structured and often ungoverned, this information may not reliably represent for the business what has happened in the real world. Structuring and standardization (e.g. modeling, validation in operational systems and cleansing as data moves to BI) allows the business to convert human-sourced information to more reliable process-mediated data.

2. Process-mediated data: Business processes are at the heart of running and managing every business. These processes record and monitor business events of interest, such as registering a customer, manufacturing a product, taking an order, etc. The process-mediated data thus collected is highly structured and includes transactions, reference tables and relationships, as well as the metadata that sets its context. Process-mediated data has long been the vast majority of what IT managed and processed, in both operational and BI systems. Its highly structured and regulated form is well suited to promoting information management and data quality, as well as for storage and manipulation in relational database systems.

3. Machine-generated data: All the name-value pair data in the previous examples falls into this category. Its source is the sensors, machines and computers used to measure and record the events and situations in the physical world. From simple sensor records to complex web logs, machine-generated data is well-structured and usually considered to be highly reliable, provided that the occurrence of faulty sensors and missing data is accounted for. As sensors proliferate and eCommerce becomes pervasive, driving ever larger data volumes, machine-generated data is becoming an increasingly important component of the information stored and processed by many businesses. Its well-structured nature is amenable to computer processing. It is sometimes claimed that its size and speed is beyond the abilities of traditional RDBMS, mandating NoSQL data stores. However, high-performance relational databases are regularly used for such data.

A Great Concept

The reason I love this concept is I can easily map these three styles of generated data to technologies that are designed to manage them.

Machine Generated Data such as Sensors can be acquired, loaded and stored with “Big Data” technologies such as Spark and Hadoop. Heaven to Betsy we can even call it a “Data Lake.”

Process Generated Data such as CRM data stored in a relational database, can be acquired using Change Data Capture (CDC) technologies and stored in an MPP style relational database.

Human Generated Data such as Twitter and Facebook can use similar patterns as the Machine Generated data. But we might use a different pattern say one that uses MongoDB rather than Hadoop.

Differentiating the three different data generation patterns allows me to easily articulate the reason for potentially needing three different technology patterns.

Patterns Count

There seems to have been a movement in the past to use these newer data technologies such as Hadoop as the only option, effectively trying to use them for patterns they were not designed for.

There is no longer one BI tool to rule them all, with most organisations I work with now moving to a BI Toolbox concept. The same should be the case for data acquisition, using the right tool for the task. The world of BI is once again becoming a world of hybrid solutions where we need to focus on picking the right technology pattern for the data and then spend the rest of our time determining how to integrate and govern these solutions.

As I write this I think I might have coined a new acronym Acquire, Load and Store (ASL), I wonder if it will become the new ETL/ELT 😉

So thank you, Barry for the Machine, Process, and Human data concept, it made the course worthwhile.

Oh and One More Thing

One of the other things I got out of the course was an “ah ha” moment around moving from the layer cake style data architecture diagrams I have been doing for years to one I have called Service Orientated Data Architecture (SODA), but that’s going to have to wait for another post.

Change, learn or fade away, it’s your choice – Shane

Shane blogs about all of the things data and business intelligence. 

Want to read more? Try Gartner Data Integration Magic Quadrant 2016 – Behind with the times or more from Shane

We run regular business intelligence courses in both Wellington and Auckland.

Start your AgileBI journey with us.

Other steps in the AgileBI Journey

And sometimes a sprint or a scrum.

Adding members to the Agile Team makes you go slower

As a Stakeholder or Product Owner I want I want to understand if constantly adding Agile Team members is a good idea So that I can have a valid conversation with my Product Owner and Scrum Master Initially at least! VelocityWhen building a new AgileBI delivery team,...

read more